Readit News logoReadit News
accurrent · a year ago
I gave a prompt and it stright up hallucinated. My prompt was about writing an article about the advantages and disadvantages of rust in the robotics ecosystem. It claimed that google cartographer was written in rust. The annoying thing about this is that it was quite convincing, I found the citation it used to be geeks for geeks blogspam that did not mention cartographer any where so I went and checked it was a C++ only project. Its worrisome when you see people relying on llms for knowledge.
jeroenhd · a year ago
People trusting LLMs to tell the truth is the advanced version of people taking the first link on Google as indubitable facts.

This whole trend is going to get much worse before it gets better.

tikkun · a year ago
I'm optimistic that hallucination rates will go down quite a bit again with the next gen of models (gpt5 / claude 4 / gemini 2 / llama 4).

I've noticed that the hallucination rate of newer more SOTA models is much lower.

3.5 sonnet hallucinates less than gpt 4 which hallucinates less than gpt 3.5 which hallucinates less than llama 70b which hallucinates less than gpt 3.

leettools · a year ago
We are actually working on a tool that provides similar functions (although we focus more on the knowledgebase curation part). Here is an article we generated from the prompt "the advantages and disadvantages of rust in the robotics ecosystem" (https://svc.leettools.com/#/share/leettools/research?id=9886...): the basic flow is to query Google using the prompt, generate the article outline using the search result summaries, and then generate each section separately. Interested to see your opinions on the differences, thanks!
accurrent · a year ago
I'm impressed, its better than the article I found written by Storm. That being said both tend to rely on whats available on the internet, so lack things that are more subtle. Its impressive that your article picked on Pixi. Of course as a practicing roboticist my arguments would be different, but at this point I'm knitpicking.

Dead Comment

kingkongjaffa · a year ago
Very cool! I asked it to create an article on the topic of my thesis and it was very good, but it lacked nuance and second order thinking i.e. here's the thing, what are the consequences of it and potential mitigations. It was able to pull existing thinking on a topic but not really synthesise a novel insight.

Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking.

From the paper it seems like this is only marginally better than the benchmark approach they used to compare against:

>Outline-driven RAG (oRAG), which is identical to RAG in outline creation, but

>further searches additional information with section titles to generate the article section by section

It seems like the key ingredients are:

- generating questions

- addressing the topic from multiple perspectives

- querying similar wikipedia articles (A high quality RAG source for facts)

- breaking the problem down by first writing an outline.

Which we can all do at home and swap out the wikipedia articles with our own data sets.

kingkongjaffa · a year ago
I was able to mimic this in GPT with out the RAG component with this custom instruction prompt, it does indeed write decent content, better than other writing prompts I have seen.

PROMPT: create 3 diverse personas who would know about the user prompt generate 5 questions that each persona would ask or clarify use the questions to create a document outline, write the document with $your_role as the intended audience.

westurner · a year ago
PROMPT`: Then, after conducting background research, Generate testable and untestable hypotheses and also suggestions for further study given market challenges and relevant marginally advantageous new and proven technologies.
dredmorbius · a year ago
"Sign in with Google" is a show-stopper.
zackmorris · a year ago
Ya and unfortunately this is from Stanford. It's a private university, but that's still not a good look. It's amazing in 2024 that so many demos, especially in AI, are getting this wrong.

We're long overdue for better sources of online revenue. I understand that AI costs money to train (I don't believe that it costs substantial money to run - that's a scam) but if we thought that walled gardens were bad, we ain't seen nothin yet. We're entering an exclusive era where the haves enjoy vastly more money than the have nots, so basically the bottom half of the population will be ignored as customers. The good apps will be exclusive clubs that the plebeians gaze at from afar, like a reverse zoo.

I just want something where I can pay 1 cent to $1 to skip login. Ideally from a virtual account that's free to use but guilts me into feeding it money. So maybe after 100 logins I pay it a few dollars. And then a reward system where wealthy users can pay it forward so others can browse for free.

I would make it in my spare time, but of course there is no such thing in the 21st century climate of boom-bust cycles and mass layoffs.

anotheraccount9 · a year ago
Yes, and it's not possible to delete the account (or association with).
jgalt212 · a year ago
And it's a challenge not to click that modal in error.

Dead Comment

mburns · a year ago
Reminds me of Cuil.

> Cuil worked on an automated encyclopedia called Cpedia, built by algorithmically summarizing and clustering ideas on the web to create encyclopedia-like reports. Instead of displaying search results, Cuil would show Cpedia articles matching the searched terms.

https://en.wikipedia.org/wiki/Cuil

chankstein38 · a year ago
Does anyone have more info on this? They thank Azure at the top so I'm assuming it's a flavor of GPT? How do they prevent hallucinations? I am always cautious about asking an LLM for facts because half of the time it feels like it just adds whatever it wants. So I'm curious if they addressed that here or if this is just poorly thought-out...
EMIRELADERO · a year ago
morsch · a year ago
Thanks. There's an example page (markdown) at the very end. You can pretty easily spot some weaknesses in the generated text, it's uncanny valley territory. The most interesting thing is that the article contains numbered references, but unfortunately those footnotes are missing from the example.
Sn0wCoder · a year ago
Not sure how it prevents hallucinations, but I tried inputting too much info and got a pop-up saying it was using Chat GPT 3.5 The article it generated was OK but seemed to repeat the same thing over and over with slightly different wording
infecto · a year ago
If you ask an LLM what color is the sky it might say purple but if you give it a paragraph describing the atmosphere and then ask the same question it will almost always answer correctly. I don't think hallucinations are as big of a problem as people make them out to be.
misnome · a year ago
So, it only works if you already know enough about the problem to not need to ask the LLM, check.
pistoriusp · a year ago
Yet remains unsolvable.
chx · a year ago
There are no hallucinations. It's just the normal bullshit people hang a more palatable name on. There is nothing else.

https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.

infecto · a year ago
Why does this get downvoted so heavily? It’s my experience running LLM in production. At scale hallucinations are not a huge problem when you have reference material.
DylanDmitri · a year ago
Seems a promising approach. Feedback at the bottom is (?) missing a submit button. Article was fine, but veered into overly verbose with redundant sections. A simplification pass, even on the outline, could help.
kingkongjaffa · a year ago
It auto-saves I believe.
siscia · a year ago
We have been discussing a similar idea with friends.

The topic of knowledge synthesis is fascinating, especially in big organisations.

Moving away from fragmented documents into a set of facts from which LLM synthetize documents from, tailored for the reader.

There are few tricks that would be interesting to have working.

For instance the agent keep evaluating itself against a set of questions. Or user adding questions to see if the agent is able to understand the nuances of the topic and so if it can be trusted.

(Not dissimilar to what would be regression testing in classical software engineering)

Then the "homework" sections, when we ask human experts to evaluate that the facts stored by the agents are still relevant and up to date.

All these can then be enhanced with actions usable by the agent.

Think about fetching the PoC for a particular piece of software. It is the employer Foo.

If we write this down in a document, it will definitely get outdated when Foo move, or get promoted.

If we put this inside a knowledge synthesis system, the system itself may keep asking every 6 months to Foo if it is still the PoC for the software project.

Or it could daily talk with the LDPA system and ask the same question as soon as it notices that Foo has changed its position or reporting structure.

This can be expanded for processes to follow. Report to create, etc...

OutOfHere · a year ago
STORM motivated me to independently create my own similar project https://github.com/impredicative/newssurvey which works very differently to write a survey article on a medical or science topic. Its generated samples are linked in the readme.