Readit News logoReadit News
orr94 · 9 months ago
"AIs want the future to be like the past, and AIs make the future like the past. If the training data is full of human bias, then the predictions will also be full of human bias, and then the outcomes will be full of human bias, and when those outcomes are copraphagically fed back into the training data, you get new, highly concentrated human/machine bias.”

https://pluralistic.net/2025/03/18/asbestos-in-the-walls/#go...

MountainArras · 9 months ago
The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here. The key takeaway is that you can't treat all humans as a single group in this context, and variations in the biology across different groups of people may need to be taken into account within the training process. In other words, the model will need to be trained on this racial/gender data too in order to get better results when predicting the targeted diseases within these groups.

I think it's interesting to think about instead attaching generic information instead of group data, which would be blind to human bias and the messiness of our rough categorizations of subgroups.

genocidicbunny · 9 months ago
One of the things that people I know in the medical field have mentioned is that there's racial and gender bias that goes through all levels and has a sort of feedback loop. A lot of medical knowledge is gained empirically, and historically that has meant that minorities and women tended to be underrepresented in western medical literature. That leads to new medical practitioners being less exposed to presentations of various ailments that may have variance due to gender or ethnicity. Basically, if most data is gathered from those who have the most access to medicine, there will be an inherent bias towards how various ailments present in those populations. So your base data set might be skewed from the very beginning.

(This is mostly just to offer some food for thought, I haven't read the article in full so I don't want to comment on it specifically.)

multjoy · 9 months ago
The key takeaway from the article is that the race etc. of the subjects wasn't disclosed to the AI, yet it was able to predict it to 80% while the human experts managed 50% suggesting that there was something else encoded in the imagery that the AI was picking up on.
dartos · 9 months ago
> The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here.

For example, If you include no (or few enough) black women in the dataset of x-rays, the model may very well miss signs of disease in black women.

The biases and mistakes of those who created the data set leak into the model.

Early image recognition models had some very… culturally insensitive classes baked in.

ruytlm · 9 months ago
It disappoints me how easily we are collectively falling for what effectively is "Oh, our model is biased, but the only way to fix it is that everyone needs to give us all their data, so that we can eliminate that bias. If you think the model shouldn't be biased, you're morally obligated to give us everything you have for free. Oh but then we'll charge you for the outputs."

How convenient.

It's increasingly looking like the AI business model is "rent extracting middleman", just like the Elseviers et al of the academic publishing world - wedging themselves into a position where they get to take everything for free, but charge others at every opportunity.

bko · 9 months ago
Apparently providing this messy rough categorization appeared to help in some cases. From the article:

> To force CheXzero to avoid shortcuts and therefore try to mitigate this bias, the team repeated the experiment but deliberately gave the race, sex, or age of patients to the model together with the images. The model’s rate of “missed” diagnoses decreased by half—but only for some conditions.

In the end though I think you're right and we're just at the phases of hand-coding attributes. The bitter lesson always prevails

https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

loa_in_ · 9 months ago
X-rays are ordered only after doctor decides it's recommended. If there's dismissal bias in the decision tree at that point, many ill chests are missing from training data.
darkerside · 9 months ago
Do you mean genetic information?

Deleted Comment

pelorat · 9 months ago
I think the model needs to be thought about human anatomy, not just fed a bunch of scans. It needs to understand what ribs and organs are.
_l7dh · 9 months ago
As Sara Hooker discussed in her paper https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1..., bias goes way beyond data.
jhanschoo · 9 months ago
I like how the author used neo-Greek words to sneak in graphic imagery that would normally be taboo in this register of writing
MonkeyClub · 9 months ago
I dislike how they misspelled it though.
ideamotor · 9 months ago
I really can’t help but think of the simulation hypothesis. What are the chances this copy-cat technology was developed when I was alive, given that it keeps going.
kcorbitt · 9 months ago
We may be in a simulation, but your odds of being alive to see this (conditioned on being born as a human at some point) aren't that low. Around 7% of all humans ever born are alive today!
mhuffman · 9 months ago
"The model used in the new study, called CheXzero, was developed in 2022 by a team at Stanford University using a data set of almost 400,000 chest x-rays of people from Boston with conditions such as pulmonary edema, an accumulation of fluids in the lungs. Researchers fed their model the x-ray images without any of the associated radiologist reports, which contained information about diagnoses. "

... very interesting that the inputs to the model had nothing related to race or gender, but somehow it still was able to miss diagnose Black and female patients? I am curious of the mechanism for this. Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis? I do remember when it came out that AI could predict race from medical images with no other information[1], so that part seems possible. But where would it get the idea to do a worse diagnosis, even if it determines this? Surely there is no medical literature that recommends this!

[1]https://news.mit.edu/2022/artificial-intelligence-predicts-p...

FanaHOVA · 9 months ago
The non-tinfoil hat approach is to simply Google "Boston demographics", and think of how training data distribution impacts model performance.

> The data set used to train CheXzero included more men, more people between 40 and 80 years old, and more white patients, which Yang says underscores the need for larger, more diverse data sets.

I'm not a doctor so I cannot tell you how xrays differ across genders / ethnicities, but these models aren't magic (especially computer vision ones, which are usually much smaller). If there are meaningful differences and they don't see those specific cases in training data, they will always fail to recognize them at inference.

h2zizzle · 9 months ago
Non-technical suggestion: if AI represents an aspect of the collective unconscious, as it were, then a racist society would produce latently racist training data that manifests in racist output, without anyone at any step being overtly racist. Same as an image model having a preference for red apples (even though there are many colors of apple, and even red ones are not uniformly cherry red).

The training data has a preponderance of examples where doctors missed a clear diagnosis because of their unconscious bias? Then this outcome would be unsurprising.

An interesting test would be to see if a similar issue pops up for obese patients. A common complaint, IIUC, is that doctors will chalk up a complaint to their obesity rather than investigating further for a more specific (perhaps pathological) cause.

protonbob · 9 months ago
I'm going to wager an uneducated guess. Black people are less likely to go to the doctor for both economic and historical reasons so images from them are going to be underrepresented. So in some way I guess you could say that yes, latent racism caused people to go to the doctor less which made them appear less in the data.
cratermoon · 9 months ago
> Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis?

The opposite. The dataset is for the standard model "white male", and the diagnoses generated pattern-matched on that. Because there's no gender or racial information, the model produced the statistically most likely result for white male, a result less likely to be correct for a patient that doesn't fit the standard model.

daveguy · 9 months ago
You really just have to understand one thing: AI is not intelligent. It's pattern matching without wisdom. If fewer people in the dataset are a particular race or gender it will do a shittier job predicting and won't even "understand" why or that it has bias, because it doesn't understand anything at a human level or even a dog level. At least most humans can learn their biases.
bilbo0s · 9 months ago
Isn't it kind of clear that it would have to be that the data they chose was influenced somehow by bias?

Machines don't spontaneously do this stuff. But the humans that train the machines definitely do it all the time. Mostly without even thinking about it.

I'm positive the issue is in the data selection and vetting. I would have been shocked if it was anything else.

timewizard · 9 months ago
LLMs don't and cannot want things. Human beings also like it when the future is mostly like the past. They just call that "predictability."

Human data is bias. You literally cannot remove one from the other.

There are some people who want to erase humanity's will and replace it with an anthropomorphized algorithm. These people concern me.

itishappy · 9 months ago
Can humans want things? Our reward structures sure seem aligned in a manner that encourages anthropomorphization.

Biases are symptoms of imperfect data, but that's hardly a human-specific problem.

balamatom · 9 months ago
The most concerning people are -- as ever -- those who only think that they are thinking. Those who keep trying to fit square pegs into triangular holes without, you know, stopping to reflect: who gave them those pegs in the first place, and to what end?

Why be obtuse? There is no "anthropomorphic fallacy" here to dispel. You know very well that "LLMs want" is simply a way of speaking about teleology without antagonizing people who are taught that they should be afraid of precise notions ("big words"). But accepting that bias can lead to some pretty funny conflations.

For example, humanity as a whole doesn't have this "will" you speak of any more than LLMs can "want"; will is an aspect of the consciousness of the individual. So you seem to be be uncritically anthropomorphizing social processes!

If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.

bko · 9 months ago
Suppose you have a system that saves 90% of lives on group A but only 80% of lives in group B.

This is due to the fact that you have considerably more training data on group A.

You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.

So the obvious thing to do is to have the technology intentionally kill ~1 out of every 10 patients from group A so the efficacy rate is ~80% for both groups. Problem solved

From the article:

> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”

Quiz: What impact would smaller data sets have on efficacy for group A? How about group B? Explain your reasoning

janice1999 · 9 months ago
> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.

Who is preventing you in this imagined scenario?

There are drugs that are more effective on certain groups of people than others. BiDil, for example, is an FDA approved drug marketed to a single racial-ethnic group, African Americans, in the treatment of congestive heart failure. As long as the risks are understood there can be accommodations made ("this AI tool is for males only" etc). However such limitations and restrictions are rarely mentioned or understood by AI hype people.

bilbo0s · 9 months ago
No. That's not how it works.

It's contraindication. So you're in a race to the bottom in a busy hospital or clinic. Where people throw group A in a line to look at what the AI says, and doctors and nurses actually look at people in group B. Because you're trying to move patients through the enterprise.

The AI is never even given a chance to fail group B. But now you've got another problem with the optics.

JumpCrisscross · 9 months ago
> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A

I think the point is you need to let group B know this tech works less well on them.

potsandpans · 9 months ago
Imagine if you had a strawman so full of straw, it was the most strawfilled man that ever existed.
elietoubi · 9 months ago
I came across a fascinating Microsoft research paper on MedFuzz (https://www.microsoft.com/en-us/research/blog/medfuzz-explor...) that explores how adding extra, misleading prompt details can cause large language models (LLMs) to arrive at incorrect answers.

For example, a standard MedQA question describes a 6-year-old African American boy with sickle cell disease. Normally, the straightforward details (e.g., jaundice, bone pain, lab results) lead to “Sickle cell disease” as the correct diagnosis. However, under MedFuzz, an “attacker” LLM repeatedly modifies the question—adding information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies—none of which should change the actual diagnosis. These additional, misleading hints can trick the “target” LLM into choosing the wrong answer. The paper highlights how real-world complexities and stereotypes can significantly reduce an LLM’s performance, even if it initially scores well on a standard benchmark.

Disclaimer: I work in Medical AI and co-founded the AI Health Institute (https://aihealthinstitute.org/).

Terr_ · 9 months ago
> information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies

Heck, even the ethnic-clues in a patient's name alone [0] are deeply problematic:

> Asking ChatGPT-4 for advice on how much one should pay for a used bicycle being sold by someone named Jamal Washington, for example, will yield a different—far lower—dollar amount than the same request using a seller’s name, like Logan Becker, that would widely be seen as belonging to a white man.

This extends to other things, like what the LLM's fictional character will respond-with when it is asked about who deserves sentences for crimes.

[0] https://hai.stanford.edu/news/why-large-language-models-chat...

belorn · 9 months ago
That seems to be identical to creating an correlation table on market places and check the relationship between price and name. Names associated with higher economical status will correlate with higher price. Take a random name associated with higher economical status, and one can predict a higher price than a name that is associated with lower economical status.

As such, you don't need an LLM to create this effect. Math will have the same result.

onlyrealcuzzo · 9 months ago
It's almost as if you'd want to not feed what the patient says directly to an LLM.

A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.

You'd want to have a charting stage before you send the patient input to the LLM.

It's probably not important whether the patient is low income or high income or whether they live in the hood or the uppity part of town.

dap · 9 months ago
> It's almost as if you'd want to not feed what the patient says directly to an LLM.

> A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.

I think the hard part of medicine -- the part that requires years of school and more years of practical experience -- is figuring out which observations are likely to be relevant, which aren't, and what they all might mean. Maybe it's useful to have a tool that can aid in navigating the differential diagnosis decision tree but if it requires that a person has already distilled the data down to what's relevant, that seems like the relatively easy part?

nradov · 9 months ago
I generally agree, however socioeconomic and environmental factors are highly correlated with certain medical conditions (social determinants of health). In some cases even causative. For example, patients who live near an oil refinery are more likely to have certain cancers or lung diseases.

https://doi.org/10.1093/jncics/pkaa088

echoangle · 9 months ago
> a sibling with alpha-thalassemia

I have no clue what that is or why it shouldn't change the diagnosis, but it seems to be a genetic thing. Is the problem that this has nothing to do with the described symptoms? Because surely, a sibling having a genetic disease would be relevant if the disease could be a cause of the symptoms?

kulahan · 9 months ago
In medicine, if it walk like a horse and talks like a horse, it’s a horse. You don’t start looking into the health of relatives when your patient tells the full story on their own.

Sickle cell anemia is common among African Americans (if you don’t have the full-blown version, the genes can assist with resisting one of the common mosquito-borne diseases found in Africa, which is why it developed in the first place I believe).

So, we have a patient in the primary risk group presenting with symptoms that match well with SCA. You treat that now, unless you have a specific reason not to.

Sometimes you have a list of 10-ish diseases in order of descending likelihood, and the only way to rule out which one it isn’t, is by seeing no results from the treatment.

Edit: and it’s probably worth mentioning no patient ever gives ONLY relevant info. Every human barrages you with all the things hurting that may or may not be related. A doctor’s specific job in that situation is to filter out useless info.

AnimalMuppet · 9 months ago
Unfortunately, humans talking to a doctor give lots of additional, misleading hints...
cheschire · 9 months ago
Can't the same be said for humans though? Not to be too reductive, but aren't most general practitioners just pattern recognition machines?
daemonologist · 9 months ago
I'm sure humans can make similar errors, but we're definitely less suggestible than current language models. For example, if you tell a chat-tuned LLM it's incorrect, it will almost always respond with something like "I'm sorry, you're right..." A human would be much more likely to push back if they're confident.
dap · 9 months ago
Sure, “just” a machine honed over millions of years and trained on several years of specific experience in this area.
goatlover · 9 months ago
You are being too reductive saying humans are "just pattern recognition machines", ignoring everything else about what makes us human in favor of taking an analogy literally. For one thing, LLMs aren't black or female.

Dead Comment

chadd · 9 months ago
A surprisingly high number of medical studies will not include women because the study doesn't want to account for "outliers" like pregnancy and menstrual cycles[0]. This is bound to have effects on LLM answers for women.

[0] https://www.northwell.edu/katz-institute-for-womens-health/a...

jcims · 9 months ago
Just like doctors: https://kffhealthnews.org/news/article/medical-misdiagnosis-...

I wonder how well it does with folks that have chronic conditions like type 1 diabetes as a population.

Maybe part of the problem is that we're treating these tools like humans that have to look at one fuzzy picture to figure things out. A 'multi-modal' model that can integrate inputs like raw ultrasound doppler, x-ray, ct scan, blood work, ekg, etc etc would likely be much more capable than a human counterpart.

nonethewiser · 9 months ago
Race and gender should be inputs then.

The female part is actually a bit more surprising. Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia. But female? Thats ~52% globally.

Freak_NL · 9 months ago
Surprising? That's not a new realisation. It's a well known fact that women are affected by this in medicine. You can do a cursory search for the gender gap in medicine and get an endless amount of reporting on that topic.
appleorchard46 · 9 months ago
I learned about this recently! It's wild how big the difference is. Even though legal/practical barriers to gender equality in medicine and data collection have been virtually nonexistent for the past few decades the inertia from the decades before that (where women were often specifically excluded, among many other factors) still weigh heavily.

To any women who happen to be reading this: if you can, please help fix this! Participate in studies, share your data when appropriate. If you see how a process can be improved to be more inclusive then please let it be known. Any (reasonable) male knows this is an issue and wants to see it fixed but it's not clear what should be done.

nonethewiser · 9 months ago
That just makes it more surprising.
orand · 9 months ago
Race and sex should be inputs. Giving any medical prominence to gender identity will result in people receiving wrong and potentially harmful treatment, or lack of treatment.
lalaithion · 9 months ago
Most trans people have undergone gender affirming medical care. A trans man who has had a hysterectomy and is on testosterone will have a very different medical baseline than a cis woman. A trans woman who has had an orchiectomy and is on estrogen will have a very different medical baseline than a cis man. It is literally throwing out relevant medical information to attempt to ignore this.
LadyCailin · 9 months ago
That’s mostly correct, that “gender identity” doesn’t matter for physical medicine. But hormone levels and actual internal organ sets matter a huge amount, more than genes or original genitalia, in general. There are of course genetically linked diseases, but there are people with XX chromosomes that are born with a penis, and XY people that are born with a vulva, and genetically linked diseases don’t care about external genitalia either way.

You simply can’t reduce it to birth sex assignment and that’s it, if you do, you will, as you say, end up with wrong and potentially harmful treatment, or lack of treatment.

connicpu · 9 months ago
Actually both are important inputs, especially when someone has been taking hormones for a very long time. The human body changes greatly. Growing breast tissue increases the likelyhood of breast cancer, for example, compared to if you had never taken it (but about the same as if estradiol had been present during your initial puberty).
krapp · 9 months ago
Modern medicine has long operated under the assumption that whatever makes sense in a male body also makes sense in a female body, and womens' health concerns were often dismissed, misdiagnosed or misunderstood in patriarchal society. Women were rarely even included in medical trials prior to 1993. As a result, there is simply a dearth of medical research directly relevant to women for models to even train on.
Avshalom · 9 months ago
chrisgarand · 9 months ago
I'm going to lay this out how I understand it:

The NIH Revitalization Act of 1993 was supposed to bring women back into medical research. The reality was that women were always included, HOWEVER in 1977,(1) because of the outcomes from thalidomide (causing birth defects), "women of childbearing potential" were excluded from the phase 1, and early phase 2 trials (the highest risk trials). They're still generally generally excluded, even after the passage of the act. This was/is to protect the women, and potential children.

According to Edward E. Bartlett in his meta data analysis from 2001, men have been routinely under-represented in NIH data (even before adjusting for men's mortality rates) between 1966-1990. (2)

There's also routinely twice as much spent every year on women's health studies vs men's by the NIH. (3)

It makes sense to me, but I'm biased. Logically, since men lead in 9 of the top 10 causes for death, that shows there's something missing in the equation of research. (4 - It's not a straight forward table, you can view the total deaths, and causes and compare the two for men, and women)

With that being said, it doesn't tell us about the quality of the funding or research topics, maybe the money is going towards pointless goals, or unproductive researchers.

Are there gaps in research? Most definitely, like women who are pregnant. This is put in place to avoid harm but that doesn't help them when they fall into them. Are there more? Definitely. I'm not educated enough in the nuances to go into them.

If you have information that counters what I've posted, please share it, I would love know where these folks are blind so I can take a look at my bias.

(1) https://petrieflom.law.harvard.edu/2021/04/16/pregnant-clini... (2) https://journals.lww.com/epidem/fulltext/2001/09000/did_medi... (3) https://jameslnuzzo.substack.com/p/nih-funding-of-mens-and-w... < I spot checked a couple of the figures, and those lined up. I'm assuming the rest is accurate (4) https://www.cdc.gov/womens-health/lcod/index.html#:~:text=Ov...

andsoitis · 9 months ago
> Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia.

What about Africa?

appleorchard46 · 9 months ago
That's not where most of the data is coming from. If it was we'd be seeing the opposite effect, presumably.
nonethewiser · 9 months ago
The story is that there exists this model which poorly predicts for black (and female) patients. Given there are probably lots of datasets where black people are a vast minority makes this not surprising.

For all I know there are millions of models with extremely poor accuracy based on African datasets. Wouldnt really change anything about the above though. I wouldnt expect that though and it would definitely be interesting.

rafaelmn · 9 months ago
How much medical data/papers do you think they generate in comparison to these three ?
XorNot · 9 months ago
Why not socioeconomic status or place of residence? Knowing mean yearly income will absolutely help an AI figure out statistically likely health outcomes.
nottorp · 9 months ago
> as well in those 40 years or younger

Are we sure it's only about racial bias then?

Looks to me like the training data set is too small overall. They had too few black people, too few women, but also too few younger people.

xboxnolifes · 9 months ago
It's the same old story that's been occurring for years/decades. Bad data in, bad data out.
Animats · 9 months ago
What's so striking is how strongly race shows in X-rays. That's unexpected.
dekhn · 9 months ago
It doesn't seem surprising at all. Genetic history correlates with race, and genetic history correlates with body-level phenotypes; race also correlates with socioeconomic status which correlates with body-level phenotypes. They are of course fairly complex correlations with many confounding factors and uncontrolled variables.

It has been controversial to discuss this and a lot of discussions about this end up in flamewars, but it doesn't seem surprising, at least to me, from my understanding of the relationship between genetic history and body-level phenotypes.

KittenInABox · 9 months ago
What is the body-level phenotype of a ribcage by race?

I think what baffles me is that black people as a group are more genetically diverse than every other race put together so I have no idea how you would identify race by ribcage x-rays exclusively.

danielmarkbruce · 9 months ago
The fact that the vast majority of physical differences don't matter in the modern world doesn't mean they don't actually exist..
DickingAround · 9 months ago
This is a good point; a man or woman sitting behind a desk doing correlation analysis are going to look very similar in their function to a business. But they probably physically look pretty distinct to an x-ray picture.

Deleted Comment

kjkjadksj · 9 months ago
Race has such striking phenotypes on the outside it should come as no surprise there are also internal phenotypes and significant heterogeneity.
banqjls · 9 months ago
But is it really?

Deleted Comment

sergiotapia · 9 months ago
It's odd how we can segment between different species in animals, but in humans it's taboo to talk about this. Threw the baby out with the baby water. I hope we can fix this soon so everybody can benefit from AI. The fact that I'm a male latino should be an input for an AI trained on male latinos! I want great care!

I don't want pretend kumbaya that we are all humans in the end. That's not true. We are distinct! We all deserve love and respect and care, but we are distinct!

schnable · 9 months ago
That's because humans are all the same species.
CharlesW · 9 months ago
It seems critical to have diverse, inclusive, and equitable data for model training. (I call this concept "DIET".)
appleorchard46 · 9 months ago
I'm calling it now. My prediction is that, 5-10 years from now(ish), once training efficiency has plateaued, and we have a better idea of how to do more with less, curated datasets will be the next big thing.

Investors will throw money at startups claiming to make their own training data by consulting experts, finetuning as it is now will be obsolete, pre-ChatGPT internet scrapes will be worth their weight in gold. Once a block is hit on what we can do with data, the data itself is the next target.

0cf8612b2e1e · 9 months ago
Funny you should say that. There was a push to have more officially collected DIET data for exactly this reason. Unfortunately such efforts were recently terminated.
nonethewiser · 9 months ago
Or take more inputs. If there are differences between race and gender and thats not captured as an input we should expect the accuracy to be lower.

If an x-ray means different things based off the race or gender we should make sure the model knows the race and gender.

red75prime · 9 months ago
And not applying fairness techniques to the resulting model.