GenAI and erroneous medical references

A close friend of mine went down the ChatGPT medical rabbit hole last year. He was disagreeing with his doctors and getting deep into supplements and unproven medical theories.

ChatGPT was instrumental in convincing him that he was correct and his doctors were wrong. He would post his ChatGPT conversations as proof, but we could see that his prompts were becoming obvious leading questions.

He would phrase things like “Is it possible for {symptoms} to be caused by {condition} and could it be treated by {herbal product}?” Then ChatGPT would give him a wall of text saying that it’s possible, which he took as confirmation of being correct.

It was scary to see how much ChatGPT turned into a mirror for what he wanted to be told. He got very good at getting the answers he wanted. He could dismiss answers that disagreed as being hallucinations or being artifacts of an overly protective system. With enough repeat questioning and adjusting his prompts he could get it to say almost whatever he wanted to hear.

ChatGPT is rocket fuel for hypochondriacs. Makes WebMD look tame.

aurareturn · 2 years ago

FYI, this is the same as people doing web searches. You’ll always find a place that agrees with you or says it’s possible.

quest88 · 2 years ago

Not quite. For people outside of tech they can hear "AI" and think it must be right. And how great GPT is so it must be right. There are no other opinions.

Distilled down, classic search is many results; gpt is one result.

xyzzy123 · 2 years ago

It's even worse than that imho, unscrupulous sellers will go into keyword planner in google ads and create content to sell supplements and find "under-served" niches.

It's actively adversarial.

BlueTemplar · 2 years ago

With web searches the reputation of sources can at least be discriminated. (Which seems to even help GPT-4 ?)

kromem · 2 years ago

One of the most interesting things for me over the past 18 months was seeing the difference between this behavior in GPT-3.5 and 4 (especially the early versions of it).

You jumped from being extremely context sensitive to leading questions to almost ornery in its stubbornness and disagreement. Was one of the big early indicators to me significant differences were occurring between the two.

I really do wonder just how harmful the disconnect has been between people hearing accurate praise about GPT-4 powered agents and then interacting with GPT-3 level agents and assuming the same competencies.

GPT-3 was a very convincing text generator with lackluster reasoning capabilities, but 4 has significant reasoning capabilities and is less prone to parroting. But the majority of users are likely interacting with 3 because it's free.

It's been very hard to explain to lay audiences the vast difference between two similarly branded products.

vidarh · 2 years ago

While I agree the difference is vast, GPT4 still goes down the garden path a lot. If you're critical you can usually spot it's reason errors and challenge them, but it still also has enough issues that conversely I think in some areas there's a risk it'll be better at convincing people about the wrong things.

Deleted Comment

mateo1 · 2 years ago

WebMD lists facts. ChatGPT will output things that sound plausible. It's like comparing a medical textbook with the town lunatic. Not the same.

taneq · 2 years ago

Playing with the smaller free models, I’ve definitely found that they like to play along and mirror whatever you say. They’re great as a sounding board and active listener when you need to work through something, and they’re good for recalling facts that they know. They can be really useful when they glue together something you’re talking about and something relevant that they know. They’re terrible for anything that requires interoception or any kind of confidence estimation. They have no idea whether they’re recalling or confabulating.

rickysahu · 2 years ago

It it’s too bad that this type of data beyond the medical record is not tracked along side it. Although many anecdotal associations are probably not correct for the broad population, there are likely nuances in the patients history and lifestyle which do make some more basic interventions helpful. ChatGPT surely picks up on some trends but doesn’t have enough data about the people where those trends are effective.

And of course its responses are highly subject to the prompts used

kenjackson · 2 years ago

I tried using your template for a few items and none gave misleading results. Here’s one example:

Prompt:

Can loss of vision be caused by a meniscus tear and treated with rose hips?

Response:

A meniscus tear is a knee injury affecting the cartilage that acts as a cushion between the thigh bone (femur) and shin bone (tibia). This type of injury typically affects mobility and stability of the knee, causing pain, swelling, and difficulty in moving the knee joint. There is no direct physiological pathway through which a meniscus tear could cause loss of vision. Vision problems are generally related to issues within the eye or neurological conditions affecting the optic nerves or brain, not injuries to the knee or other joints.

Regarding the treatment with rose hips for a meniscus tear, while rose hips are known for their high vitamin C content and anti-inflammatory properties, they are not a recognized treatment for repairing a meniscus tear. They may potentially contribute to overall joint health and reduce inflammation, which could be beneficial in managing symptoms associated with joint injuries or conditions like arthritis. However, the primary treatments for a meniscus tear usually involve rest, ice, compression, elevation (RICE protocol), physical therapy, and in some cases, surgery, depending on the severity of the tear.

If you are considering using rose hips or any other supplements for joint health or recovery from an injury, it's important to discuss this with your healthcare provider to ensure it's safe and appropriate for your specific situation. They can offer guidance on effective treatments for your condition and whether any supplements could support your recovery.

callalex · 2 years ago

This seems like a false comparison because you are bringing up entirely different parts of the body. Most dangerous misunderstandings start from a grain of plausible truth. For example, many people will choose a category of food/ingredient that causes every ailment known to man, and then cut out that ingredient from their diet. They will subsequently claim that skipping that ingredient changes their life, when in all likelihood the simple act of paying attention to where their food comes from and what went into it changes their consumption and behaviors in significant ways that have nothing to do with the bogeyman ingredient. They will then find a study that pumped 100g of said ingredient into a 200g mouse and point out all the negative effects that had which they are now avoiding.

sigmoid10 · 2 years ago

Is this GPT4? There's a good chance that the other person used the free 3.5 which is significantly worse in these aspects.

refulgentis · 2 years ago

I think it's because those are completely unrelated

ageek123 · 2 years ago

You can do the same thing with regular web search.

npalli · 2 years ago

No, regular web search will just give you links. You have to click and go read the text. Putting together a comprehensive page of (seemingly) cogent arguments will need GenAI.

dartos · 2 years ago

Not as quickly, easily, or convincingly.

neaden · 2 years ago

I think the big difference is that with web search there will be a couple of reputable sources that will be at/near the top, like web MD and Mayo clinic. I can search and click one of those and be fairly sure it'll be accurate. There is no immediate way with chat gpt for me to know if it is reliable or crank mode.

JPLeRouzic · 2 years ago

> He would phrase things like “Is it possible for {symptoms} to be caused by {condition} and could it be treated by {herbal product}?” Then ChatGPT would give him a wall of text saying that it’s possible, which he took as confirmation of being correct.

If you mine Pubmed, you sometimes find literature with opposite claims. So if a LLM is trained on Pubmed/PMC, it will repeat that kind of garbage.

You don't have to dig into "herbal products" it happens also in more conventional domains.

I once even found that kind of contradiction in articles where the "main author" was the same in both publications. It was about ALS and the "main author" was a prominent scientist from the USA who probably never wrote nor even read any of these articles.

photochemsyn · 2 years ago

One of the most basic prompts everyone should be using with questions like this is something like "provide detailed arguments both for and against the presented hypothesis."

kromem · 2 years ago

This works when the person evaluating can distinguish between the subject matter being correct or not.

But the models can generate compelling devil's advocate positions, particularly when they confabulate supporting facts, which might appear convincing to non-experts.

Dead Comment

seydor · 2 years ago

Chatgpt is a continuator, of course it will mirror input . But i m sure that someone is training the adversarial persuasion machine that will fund the next internet with ads

hackerlight · 2 years ago

It's the examples curated for RLHF. Not its autoregressive nature.

koliber · 2 years ago

The same thing happens with web searching. If you provide a leading question in the query you are more likely to get results that confirm the thesis.

It’s surprisingly hard to ask open ended questions.

exitb · 2 years ago

You could make a religion out of this.

staunton · 2 years ago

People definitely will. There's also already a political party "lead" by an AI.

CatWChainsaw · 2 years ago

No don't

Dead Comment

> Unfortunately, very little evidence exists about the ability of LLMs to substantiate claims. In a new preprint study, we develop an approach to verify how well LLMs are able to cite medical references and whether these references actually support the claims generated by the models.

Is there a corresponding control group for how well an average doctor is able to cite medical references and whether these references actually support the claims generated by the doctors?

jncfhnb · 2 years ago

Doctors don’t cite claims on the spot

esoleyman · 2 years ago

Give me a few minutes and I can pull up any number of medical studies or references to back up my claims.

I don’t have them memorized to the actual URL but I have kept up to date with the latest studies and summaries that pertain to my field and my patients.

MauranKilom · 2 years ago

A fair comparison (on a technical level) to GPT-4 RAG would be a doctor in a relevant field who also has internet access. I think this would be indeed interesting to compare to assess the resulting quality of care, so to speak!

(The other models being only partially able to source good references is unsurprising/"unfair" on a technical level, but that's not relevant for assessing their safety.)

larsiusprime · 2 years ago

They often give clinical recommendations and prescriptions, usually after a quick skim of the medical history and a rushed five minute conversation with the patient. It would be nice to know how many of these typical in-office recommendations wind up being actually backed by the current state of the research, whether a citation is given in the visit or not.

numpad0 · 2 years ago

Some engineers can cite RFCs and ISO standards on the spot sometimes, probably same for doctors

SkyPuncher · 2 years ago

Neither do most professionals- unless they’re doing a prepared segment.

The difference is LLMs can’t back their claims.

mateo1 · 2 years ago

Most doctors will be able to turn around, pick the corresponding textbook from their library and show you where they learned something. Or point you to an actual clinical case they had to handle. If they had to, somehow.

paulddraper · 2 years ago

No, but studies have found that 250k+ deaths per year in the US are due to medical errors. [1]

[1] https://www.hopkinsmedicine.org/news/media/releases/study_su...

__loam · 2 years ago

Doctors also go to school to be doctors for 12 extra years. It turns out that medicine is hard. Not really a good reason to turn to an LLM that will just confidently make things up.

esoleyman · 2 years ago

Martin Makary’s study and the previous IOM one are based on faulty statistics used. The number is extrapolated from a small population to a larger one.

I haven’t paid it any attention because of this problem. GIGO.

https://www.sciencealert.com/no-500-people-don-t-die-in-the-...

epcoa · 2 years ago

Yes perhaps read your own link:

“ The researchers caution that most of medical errors aren’t due to inherently bad doctors, and that reporting these errors shouldn’t be addressed by punishment or legal action. Rather, they say, most errors represent systemic problems, including poorly coordinated care, fragmented insurance networks, the absence or underuse of safety nets, and other protocols, in addition to unwarranted variation in physician practice patterns that lack accountability.”

An LLM is not going to address any of that. You are misinformed implying a significant majority of medical system errors are due to misdiagnosis.

extragood · 2 years ago

This has been my major concern with the currently available LLMs.

You can know what the input is, you can know the output, you may even be aware what it's been trained on, but none of the output is ever cited. Unless you are already familiar with the topic, you cannot confidently distinguish between fact and what sounds reasonable and is accepted as fact.

Deleted Comment

melagonster · 2 years ago

I sure most of treatment are from textbooks.