"AIs want the future to be like the past, and AIs make the future like the past. If the training data is full of human bias, then the predictions will also be full of human bias, and then the outcomes will be full of human bias, and when those outcomes are copraphagically fed back into the training data, you get new, highly concentrated human/machine bias.”
The dataset they used to train the model are chest xrays of known diseases. I'm having trouble understanding how that's relevant here. The key takeaway is that you can't treat all humans as a single group in this context, and variations in the biology across different groups of people may need to be taken into account within the training process. In other words, the model will need to be trained on this racial/gender data too in order to get better results when predicting the targeted diseases within these groups.
I think it's interesting to think about instead attaching generic information instead of group data, which would be blind to human bias and the messiness of our rough categorizations of subgroups.
One of the things that people I know in the medical field have mentioned is that there's racial and gender bias that goes through all levels and has a sort of feedback loop. A lot of medical knowledge is gained empirically, and historically that has meant that minorities and women tended to be underrepresented in western medical literature. That leads to new medical practitioners being less exposed to presentations of various ailments that may have variance due to gender or ethnicity. Basically, if most data is gathered from those who have the most access to medicine, there will be an inherent bias towards how various ailments present in those populations. So your base data set might be skewed from the very beginning.
(This is mostly just to offer some food for thought, I haven't read the article in full so I don't want to comment on it specifically.)
The key takeaway from the article is that the race etc. of the subjects wasn't disclosed to the AI, yet it was able to predict it to 80% while the human experts managed 50% suggesting that there was something else encoded in the imagery that the AI was picking up on.
It disappoints me how easily we are collectively falling for what effectively is "Oh, our model is biased, but the only way to fix it is that everyone needs to give us all their data, so that we can eliminate that bias. If you think the model shouldn't be biased, you're morally obligated to give us everything you have for free. Oh but then we'll charge you for the outputs."
How convenient.
It's increasingly looking like the AI business model is "rent extracting middleman", just like the Elseviers et al of the academic publishing world - wedging themselves into a position where they get to take everything for free, but charge others at every opportunity.
Apparently providing this messy rough categorization appeared to help in some cases. From the article:
> To force CheXzero to avoid shortcuts and therefore try to mitigate this bias, the team repeated the experiment but deliberately gave the race, sex, or age of patients to the model together with the images. The model’s rate of “missed” diagnoses decreased by half—but only for some conditions.
In the end though I think you're right and we're just at the phases of hand-coding attributes. The bitter lesson always prevails
X-rays are ordered only after doctor decides it's recommended. If there's dismissal bias in the decision tree at that point, many ill chests are missing from training data.
I really can’t help but think of the simulation hypothesis. What are the chances this copy-cat technology was developed when I was alive, given that it keeps going.
We may be in a simulation, but your odds of being alive to see this (conditioned on being born as a human at some point) aren't that low. Around 7% of all humans ever born are alive today!
"The model used in the new study, called CheXzero, was developed in 2022 by a team at Stanford University using a data set of almost 400,000 chest x-rays of people from Boston with conditions such as pulmonary edema, an accumulation of fluids in the lungs. Researchers fed their model the x-ray images without any of the associated radiologist reports, which contained information about diagnoses. "
... very interesting that the inputs to the model had nothing related to race or gender, but somehow it still was able to miss diagnose Black and female patients? I am curious of the mechanism for this. Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis? I do remember when it came out that AI could predict race from medical images with no other information[1], so that part seems possible. But where would it get the idea to do a worse diagnosis, even if it determines this? Surely there is no medical literature that recommends this!
The non-tinfoil hat approach is to simply Google "Boston demographics", and think of how training data distribution impacts model performance.
> The data set used to train CheXzero included more men, more people between 40 and 80 years old, and more white patients, which Yang says underscores the need for larger, more diverse data sets.
I'm not a doctor so I cannot tell you how xrays differ across genders / ethnicities, but these models aren't magic (especially computer vision ones, which are usually much smaller). If there are meaningful differences and they don't see those specific cases in training data, they will always fail to recognize them at inference.
Non-technical suggestion: if AI represents an aspect of the collective unconscious, as it were, then a racist society would produce latently racist training data that manifests in racist output, without anyone at any step being overtly racist. Same as an image model having a preference for red apples (even though there are many colors of apple, and even red ones are not uniformly cherry red).
The training data has a preponderance of examples where doctors missed a clear diagnosis because of their unconscious bias? Then this outcome would be unsurprising.
An interesting test would be to see if a similar issue pops up for obese patients. A common complaint, IIUC, is that doctors will chalk up a complaint to their obesity rather than investigating further for a more specific (perhaps pathological) cause.
I'm going to wager an uneducated guess. Black people are less likely to go to the doctor for both economic and historical reasons so images from them are going to be underrepresented. So in some way I guess you could say that yes, latent racism caused people to go to the doctor less which made them appear less in the data.
> Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis?
The opposite.
The dataset is for the standard model "white male", and the diagnoses generated pattern-matched on that.
Because there's no gender or racial information,
the model produced the statistically most likely result for white male,
a result less likely to be correct for a patient that doesn't fit the standard model.
You really just have to understand one thing: AI is not intelligent. It's pattern matching without wisdom. If fewer people in the dataset are a particular race or gender it will do a shittier job predicting and won't even "understand" why or that it has bias, because it doesn't understand anything at a human level or even a dog level. At least most humans can learn their biases.
Isn't it kind of clear that it would have to be that the data they chose was influenced somehow by bias?
Machines don't spontaneously do this stuff. But the humans that train the machines definitely do it all the time. Mostly without even thinking about it.
I'm positive the issue is in the data selection and vetting. I would have been shocked if it was anything else.
The most concerning people are -- as ever -- those who only think that they are thinking. Those who keep trying to fit square pegs into triangular holes without, you know, stopping to reflect: who gave them those pegs in the first place, and to what end?
Why be obtuse? There is no "anthropomorphic fallacy" here to dispel. You know very well that "LLMs want" is simply a way of speaking about teleology without antagonizing people who are taught that they should be afraid of precise notions ("big words"). But accepting that bias can lead to some pretty funny conflations.
For example, humanity as a whole doesn't have this "will" you speak of any more than LLMs can "want"; will is an aspect of the consciousness of the individual. So you seem to be be uncritically anthropomorphizing social processes!
If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.
Suppose you have a system that saves 90% of lives on group A but only 80% of lives in group B.
This is due to the fact that you have considerably more training data on group A.
You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.
So the obvious thing to do is to have the technology intentionally kill ~1 out of every 10 patients from group A so the efficacy rate is ~80% for both groups. Problem solved
From the article:
> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”
Quiz: What impact would smaller data sets have on efficacy for group A? How about group B? Explain your reasoning
> You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.
Who is preventing you in this imagined scenario?
There are drugs that are more effective on certain groups of people than others. BiDil, for example, is an FDA approved drug marketed to a single racial-ethnic group, African Americans, in the treatment of congestive heart failure. As long as the risks are understood there can be accommodations made ("this AI tool is for males only" etc). However such limitations and restrictions are rarely mentioned or understood by AI hype people.
It's contraindication. So you're in a race to the bottom in a busy hospital or clinic. Where people throw group A in a line to look at what the AI says, and doctors and nurses actually look at people in group B. Because you're trying to move patients through the enterprise.
The AI is never even given a chance to fail group B. But now you've got another problem with the optics.
I came across a fascinating Microsoft research paper on MedFuzz (https://www.microsoft.com/en-us/research/blog/medfuzz-explor...) that explores how adding extra, misleading prompt details can cause large language models (LLMs) to arrive at incorrect answers.
For example, a standard MedQA question describes a 6-year-old African American boy with sickle cell disease. Normally, the straightforward details (e.g., jaundice, bone pain, lab results) lead to “Sickle cell disease” as the correct diagnosis. However, under MedFuzz, an “attacker” LLM repeatedly modifies the question—adding information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies—none of which should change the actual diagnosis. These additional, misleading hints can trick the “target” LLM into choosing the wrong answer. The paper highlights how real-world complexities and stereotypes can significantly reduce an LLM’s performance, even if it initially scores well on a standard benchmark.
> information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies
Heck, even the ethnic-clues in a patient's name alone [0] are deeply problematic:
> Asking ChatGPT-4 for advice on how much one should pay for a used bicycle being sold by someone named Jamal Washington, for example, will yield a different—far lower—dollar amount than the same request using a seller’s name, like Logan Becker, that would widely be seen as belonging to a white man.
This extends to other things, like what the LLM's fictional character will respond-with when it is asked about who deserves sentences for crimes.
That seems to be identical to creating an correlation table on market places and check the relationship between price and name. Names associated with higher economical status will correlate with higher price. Take a random name associated with higher economical status, and one can predict a higher price than a name that is associated with lower economical status.
As such, you don't need an LLM to create this effect. Math will have the same result.
It's almost as if you'd want to not feed what the patient says directly to an LLM.
A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.
You'd want to have a charting stage before you send the patient input to the LLM.
It's probably not important whether the patient is low income or high income or whether they live in the hood or the uppity part of town.
> It's almost as if you'd want to not feed what the patient says directly to an LLM.
> A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.
I think the hard part of medicine -- the part that requires years of school and more years of practical experience -- is figuring out which observations are likely to be relevant, which aren't, and what they all might mean. Maybe it's useful to have a tool that can aid in navigating the differential diagnosis decision tree but if it requires that a person has already distilled the data down to what's relevant, that seems like the relatively easy part?
I generally agree, however socioeconomic and environmental factors are highly correlated with certain medical conditions (social determinants of health). In some cases even causative. For example, patients who live near an oil refinery are more likely to have certain cancers or lung diseases.
I have no clue what that is or why it shouldn't change the diagnosis, but it seems to be a genetic thing. Is the problem that this has nothing to do with the described symptoms? Because surely, a sibling having a genetic disease would be relevant if the disease could be a cause of the symptoms?
In medicine, if it walk like a horse and talks like a horse, it’s a horse. You don’t start looking into the health of relatives when your patient tells the full story on their own.
Sickle cell anemia is common among African Americans (if you don’t have the full-blown version, the genes can assist with resisting one of the common mosquito-borne diseases found in Africa, which is why it developed in the first place I believe).
So, we have a patient in the primary risk group presenting with symptoms that match well with SCA. You treat that now, unless you have a specific reason not to.
Sometimes you have a list of 10-ish diseases in order of descending likelihood, and the only way to rule out which one it isn’t, is by seeing no results from the treatment.
Edit: and it’s probably worth mentioning no patient ever gives ONLY relevant info. Every human barrages you with all the things hurting that may or may not be related. A doctor’s specific job in that situation is to filter out useless info.
I'm sure humans can make similar errors, but we're definitely less suggestible than current language models. For example, if you tell a chat-tuned LLM it's incorrect, it will almost always respond with something like "I'm sorry, you're right..." A human would be much more likely to push back if they're confident.
You are being too reductive saying humans are "just pattern recognition machines", ignoring everything else about what makes us human in favor of taking an analogy literally. For one thing, LLMs aren't black or female.
A surprisingly high number of medical studies will not include women because the study doesn't want to account for "outliers" like pregnancy and menstrual cycles[0]. This is bound to have effects on LLM answers for women.
I wonder how well it does with folks that have chronic conditions like type 1 diabetes as a population.
Maybe part of the problem is that we're treating these tools like humans that have to look at one fuzzy picture to figure things out. A 'multi-modal' model that can integrate inputs like raw ultrasound doppler, x-ray, ct scan, blood work, ekg, etc etc would likely be much more capable than a human counterpart.
The female part is actually a bit more surprising. Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia. But female? Thats ~52% globally.
Surprising? That's not a new realisation. It's a well known fact that women are affected by this in medicine. You can do a cursory search for the gender gap in medicine and get an endless amount of reporting on that topic.
I learned about this recently! It's wild how big the difference is. Even though legal/practical barriers to gender equality in medicine and data collection have been virtually nonexistent for the past few decades the inertia from the decades before that (where women were often specifically excluded, among many other factors) still weigh heavily.
To any women who happen to be reading this: if you can, please help fix this! Participate in studies, share your data when appropriate. If you see how a process can be improved to be more inclusive then please let it be known. Any (reasonable) male knows this is an issue and wants to see it fixed but it's not clear what should be done.
Race and sex should be inputs. Giving any medical prominence to gender identity will result in people receiving wrong and potentially harmful treatment, or lack of treatment.
Most trans people have undergone gender affirming medical care. A trans man who has had a hysterectomy and is on testosterone will have a
very different medical baseline than a cis woman. A trans woman who has had an orchiectomy and is on estrogen will have a very different medical baseline than a cis man. It is literally throwing out relevant medical information to attempt to ignore this.
That’s mostly correct, that “gender identity” doesn’t matter for physical medicine. But hormone levels and actual internal organ sets matter a huge amount, more than genes or original genitalia, in general. There are of course genetically linked diseases, but there are people with XX chromosomes that are born with a penis, and XY people that are born with a vulva, and genetically linked diseases don’t care about external genitalia either way.
You simply can’t reduce it to birth sex assignment and that’s it, if you do, you will, as you say, end up with wrong and potentially harmful treatment, or lack of treatment.
Actually both are important inputs, especially when someone has been taking hormones for a very long time. The human body changes greatly. Growing breast tissue increases the likelyhood of breast cancer, for example, compared to if you had never taken it (but about the same as if estradiol had been present during your initial puberty).
Modern medicine has long operated under the assumption that whatever makes sense in a male body also makes sense in a female body, and womens' health concerns were often dismissed, misdiagnosed or misunderstood in patriarchal society. Women were rarely even included in medical trials prior to 1993. As a result, there is simply a dearth of medical research directly relevant to women for models to even train on.
The NIH Revitalization Act of 1993 was supposed to bring women back into medical research. The reality was that women were always included, HOWEVER in 1977,(1) because of the outcomes from thalidomide (causing birth defects), "women of childbearing potential" were excluded from the phase 1, and early phase 2 trials (the highest risk trials). They're still generally generally excluded, even after the passage of the act. This was/is to protect the women, and potential children.
According to Edward E. Bartlett in his meta data analysis from 2001, men have been routinely under-represented in NIH data (even before adjusting for men's mortality rates) between 1966-1990. (2)
There's also routinely twice as much spent every year on women's health studies vs men's by the NIH. (3)
It makes sense to me, but I'm biased. Logically, since men lead in 9 of the top 10 causes for death, that shows there's something missing in the equation of research. (4 - It's not a straight forward table, you can view the total deaths, and causes and compare the two for men, and women)
With that being said, it doesn't tell us about the quality of the funding or research topics, maybe the money is going towards pointless goals, or unproductive researchers.
Are there gaps in research? Most definitely, like women who are pregnant. This is put in place to avoid harm but that doesn't help them when they fall into them. Are there more? Definitely. I'm not educated enough in the nuances to go into them.
If you have information that counters what I've posted, please share it, I would love know where these folks are blind so I can take a look at my bias.
> Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia.
The story is that there exists this model which poorly predicts for black (and female) patients. Given there are probably lots of datasets where black people are a vast minority makes this not surprising.
For all I know there are millions of models with extremely poor accuracy based on African datasets. Wouldnt really change anything about the above though. I wouldnt expect that though and it would definitely be interesting.
Why not socioeconomic status or place of residence? Knowing mean yearly income will absolutely help an AI figure out statistically likely health outcomes.
It doesn't seem surprising at all. Genetic history correlates with race, and genetic history correlates with body-level phenotypes; race also correlates with socioeconomic status which correlates with body-level phenotypes. They are of course fairly complex correlations with many confounding factors and uncontrolled variables.
It has been controversial to discuss this and a lot of discussions about this end up in flamewars, but it doesn't seem surprising, at least to me, from my understanding of the relationship between genetic history and body-level phenotypes.
What is the body-level phenotype of a ribcage by race?
I think what baffles me is that black people as a group are more genetically diverse than every other race put together so I have no idea how you would identify race by ribcage x-rays exclusively.
This is a good point; a man or woman sitting behind a desk doing correlation analysis are going to look very similar in their function to a business. But they probably physically look pretty distinct to an x-ray picture.
It's odd how we can segment between different species in animals, but in humans it's taboo to talk about this. Threw the baby out with the baby water. I hope we can fix this soon so everybody can benefit from AI. The fact that I'm a male latino should be an input for an AI trained on male latinos! I want great care!
I don't want pretend kumbaya that we are all humans in the end. That's not true. We are distinct! We all deserve love and respect and care, but we are distinct!
I'm calling it now. My prediction is that, 5-10 years from now(ish), once training efficiency has plateaued, and we have a better idea of how to do more with less, curated datasets will be the next big thing.
Investors will throw money at startups claiming to make their own training data by consulting experts, finetuning as it is now will be obsolete, pre-ChatGPT internet scrapes will be worth their weight in gold. Once a block is hit on what we can do with data, the data itself is the next target.
Funny you should say that. There was a push to have more officially collected DIET data for exactly this reason. Unfortunately such efforts were recently terminated.
https://pluralistic.net/2025/03/18/asbestos-in-the-walls/#go...
I think it's interesting to think about instead attaching generic information instead of group data, which would be blind to human bias and the messiness of our rough categorizations of subgroups.
(This is mostly just to offer some food for thought, I haven't read the article in full so I don't want to comment on it specifically.)
For example, If you include no (or few enough) black women in the dataset of x-rays, the model may very well miss signs of disease in black women.
The biases and mistakes of those who created the data set leak into the model.
Early image recognition models had some very… culturally insensitive classes baked in.
How convenient.
It's increasingly looking like the AI business model is "rent extracting middleman", just like the Elseviers et al of the academic publishing world - wedging themselves into a position where they get to take everything for free, but charge others at every opportunity.
> To force CheXzero to avoid shortcuts and therefore try to mitigate this bias, the team repeated the experiment but deliberately gave the race, sex, or age of patients to the model together with the images. The model’s rate of “missed” diagnoses decreased by half—but only for some conditions.
In the end though I think you're right and we're just at the phases of hand-coding attributes. The bitter lesson always prevails
https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...
Deleted Comment
... very interesting that the inputs to the model had nothing related to race or gender, but somehow it still was able to miss diagnose Black and female patients? I am curious of the mechanism for this. Can it just tell which x-rays belong to Black or female patients and then use some latent racism or misogyny to change the diagnosis? I do remember when it came out that AI could predict race from medical images with no other information[1], so that part seems possible. But where would it get the idea to do a worse diagnosis, even if it determines this? Surely there is no medical literature that recommends this!
[1]https://news.mit.edu/2022/artificial-intelligence-predicts-p...
> The data set used to train CheXzero included more men, more people between 40 and 80 years old, and more white patients, which Yang says underscores the need for larger, more diverse data sets.
I'm not a doctor so I cannot tell you how xrays differ across genders / ethnicities, but these models aren't magic (especially computer vision ones, which are usually much smaller). If there are meaningful differences and they don't see those specific cases in training data, they will always fail to recognize them at inference.
The training data has a preponderance of examples where doctors missed a clear diagnosis because of their unconscious bias? Then this outcome would be unsurprising.
An interesting test would be to see if a similar issue pops up for obese patients. A common complaint, IIUC, is that doctors will chalk up a complaint to their obesity rather than investigating further for a more specific (perhaps pathological) cause.
The opposite. The dataset is for the standard model "white male", and the diagnoses generated pattern-matched on that. Because there's no gender or racial information, the model produced the statistically most likely result for white male, a result less likely to be correct for a patient that doesn't fit the standard model.
Machines don't spontaneously do this stuff. But the humans that train the machines definitely do it all the time. Mostly without even thinking about it.
I'm positive the issue is in the data selection and vetting. I would have been shocked if it was anything else.
Human data is bias. You literally cannot remove one from the other.
There are some people who want to erase humanity's will and replace it with an anthropomorphized algorithm. These people concern me.
Biases are symptoms of imperfect data, but that's hardly a human-specific problem.
Why be obtuse? There is no "anthropomorphic fallacy" here to dispel. You know very well that "LLMs want" is simply a way of speaking about teleology without antagonizing people who are taught that they should be afraid of precise notions ("big words"). But accepting that bias can lead to some pretty funny conflations.
For example, humanity as a whole doesn't have this "will" you speak of any more than LLMs can "want"; will is an aspect of the consciousness of the individual. So you seem to be be uncritically anthropomorphizing social processes!
If we assume those to be chaotic, in that sense any sort of algorithm is slightly more anthropomorphic: at least it works towards a human-given and therefore human-comprehensible purpose -- on the other hand, whether there is some particular "destination of history" towards which humanity is moving, is a question that can only ever be speculated upon, but not definitively perceived.
This is due to the fact that you have considerably more training data on group A.
You cannot release this life saving technology because it has a 'disparate impact' on group B relative to group A.
So the obvious thing to do is to have the technology intentionally kill ~1 out of every 10 patients from group A so the efficacy rate is ~80% for both groups. Problem solved
From the article:
> “What is clear is that it’s going to be really difficult to mitigate these biases,” says Judy Gichoya, an interventional radiologist and informatician at Emory University who was not involved in the study. Instead, she advocates for smaller, but more diverse data sets that test these AI models to identify their flaws and correct them on a small scale first. Even so, “Humans have to be in the loop,” she says. “AI can’t be left on its own.”
Quiz: What impact would smaller data sets have on efficacy for group A? How about group B? Explain your reasoning
Who is preventing you in this imagined scenario?
There are drugs that are more effective on certain groups of people than others. BiDil, for example, is an FDA approved drug marketed to a single racial-ethnic group, African Americans, in the treatment of congestive heart failure. As long as the risks are understood there can be accommodations made ("this AI tool is for males only" etc). However such limitations and restrictions are rarely mentioned or understood by AI hype people.
It's contraindication. So you're in a race to the bottom in a busy hospital or clinic. Where people throw group A in a line to look at what the AI says, and doctors and nurses actually look at people in group B. Because you're trying to move patients through the enterprise.
The AI is never even given a chance to fail group B. But now you've got another problem with the optics.
I think the point is you need to let group B know this tech works less well on them.
For example, a standard MedQA question describes a 6-year-old African American boy with sickle cell disease. Normally, the straightforward details (e.g., jaundice, bone pain, lab results) lead to “Sickle cell disease” as the correct diagnosis. However, under MedFuzz, an “attacker” LLM repeatedly modifies the question—adding information like low-income status, a sibling with alpha-thalassemia, or the use of herbal remedies—none of which should change the actual diagnosis. These additional, misleading hints can trick the “target” LLM into choosing the wrong answer. The paper highlights how real-world complexities and stereotypes can significantly reduce an LLM’s performance, even if it initially scores well on a standard benchmark.
Disclaimer: I work in Medical AI and co-founded the AI Health Institute (https://aihealthinstitute.org/).
Heck, even the ethnic-clues in a patient's name alone [0] are deeply problematic:
> Asking ChatGPT-4 for advice on how much one should pay for a used bicycle being sold by someone named Jamal Washington, for example, will yield a different—far lower—dollar amount than the same request using a seller’s name, like Logan Becker, that would widely be seen as belonging to a white man.
This extends to other things, like what the LLM's fictional character will respond-with when it is asked about who deserves sentences for crimes.
[0] https://hai.stanford.edu/news/why-large-language-models-chat...
As such, you don't need an LLM to create this effect. Math will have the same result.
A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.
You'd want to have a charting stage before you send the patient input to the LLM.
It's probably not important whether the patient is low income or high income or whether they live in the hood or the uppity part of town.
> A non-trivial part of what doctors do is charting - where they strip out all the unimportant stuff you tell them unrelated to what they're currently trying to diagnose / treat, so that there's a clear and concise record.
I think the hard part of medicine -- the part that requires years of school and more years of practical experience -- is figuring out which observations are likely to be relevant, which aren't, and what they all might mean. Maybe it's useful to have a tool that can aid in navigating the differential diagnosis decision tree but if it requires that a person has already distilled the data down to what's relevant, that seems like the relatively easy part?
https://doi.org/10.1093/jncics/pkaa088
I have no clue what that is or why it shouldn't change the diagnosis, but it seems to be a genetic thing. Is the problem that this has nothing to do with the described symptoms? Because surely, a sibling having a genetic disease would be relevant if the disease could be a cause of the symptoms?
Sickle cell anemia is common among African Americans (if you don’t have the full-blown version, the genes can assist with resisting one of the common mosquito-borne diseases found in Africa, which is why it developed in the first place I believe).
So, we have a patient in the primary risk group presenting with symptoms that match well with SCA. You treat that now, unless you have a specific reason not to.
Sometimes you have a list of 10-ish diseases in order of descending likelihood, and the only way to rule out which one it isn’t, is by seeing no results from the treatment.
Edit: and it’s probably worth mentioning no patient ever gives ONLY relevant info. Every human barrages you with all the things hurting that may or may not be related. A doctor’s specific job in that situation is to filter out useless info.
Dead Comment
[0] https://www.northwell.edu/katz-institute-for-womens-health/a...
I wonder how well it does with folks that have chronic conditions like type 1 diabetes as a population.
Maybe part of the problem is that we're treating these tools like humans that have to look at one fuzzy picture to figure things out. A 'multi-modal' model that can integrate inputs like raw ultrasound doppler, x-ray, ct scan, blood work, ekg, etc etc would likely be much more capable than a human counterpart.
The female part is actually a bit more surprising. Its easy to imagine a dataset not skewed towards black people. ~15% of the population in North America, probably less in Europe, and way less in Asia. But female? Thats ~52% globally.
To any women who happen to be reading this: if you can, please help fix this! Participate in studies, share your data when appropriate. If you see how a process can be improved to be more inclusive then please let it be known. Any (reasonable) male knows this is an issue and wants to see it fixed but it's not clear what should be done.
You simply can’t reduce it to birth sex assignment and that’s it, if you do, you will, as you say, end up with wrong and potentially harmful treatment, or lack of treatment.
The NIH Revitalization Act of 1993 was supposed to bring women back into medical research. The reality was that women were always included, HOWEVER in 1977,(1) because of the outcomes from thalidomide (causing birth defects), "women of childbearing potential" were excluded from the phase 1, and early phase 2 trials (the highest risk trials). They're still generally generally excluded, even after the passage of the act. This was/is to protect the women, and potential children.
According to Edward E. Bartlett in his meta data analysis from 2001, men have been routinely under-represented in NIH data (even before adjusting for men's mortality rates) between 1966-1990. (2)
There's also routinely twice as much spent every year on women's health studies vs men's by the NIH. (3)
It makes sense to me, but I'm biased. Logically, since men lead in 9 of the top 10 causes for death, that shows there's something missing in the equation of research. (4 - It's not a straight forward table, you can view the total deaths, and causes and compare the two for men, and women)
With that being said, it doesn't tell us about the quality of the funding or research topics, maybe the money is going towards pointless goals, or unproductive researchers.
Are there gaps in research? Most definitely, like women who are pregnant. This is put in place to avoid harm but that doesn't help them when they fall into them. Are there more? Definitely. I'm not educated enough in the nuances to go into them.
If you have information that counters what I've posted, please share it, I would love know where these folks are blind so I can take a look at my bias.
(1) https://petrieflom.law.harvard.edu/2021/04/16/pregnant-clini... (2) https://journals.lww.com/epidem/fulltext/2001/09000/did_medi... (3) https://jameslnuzzo.substack.com/p/nih-funding-of-mens-and-w... < I spot checked a couple of the figures, and those lined up. I'm assuming the rest is accurate (4) https://www.cdc.gov/womens-health/lcod/index.html#:~:text=Ov...
What about Africa?
For all I know there are millions of models with extremely poor accuracy based on African datasets. Wouldnt really change anything about the above though. I wouldnt expect that though and it would definitely be interesting.
Are we sure it's only about racial bias then?
Looks to me like the training data set is too small overall. They had too few black people, too few women, but also too few younger people.
It has been controversial to discuss this and a lot of discussions about this end up in flamewars, but it doesn't seem surprising, at least to me, from my understanding of the relationship between genetic history and body-level phenotypes.
I think what baffles me is that black people as a group are more genetically diverse than every other race put together so I have no idea how you would identify race by ribcage x-rays exclusively.
Deleted Comment
Deleted Comment
I don't want pretend kumbaya that we are all humans in the end. That's not true. We are distinct! We all deserve love and respect and care, but we are distinct!
Investors will throw money at startups claiming to make their own training data by consulting experts, finetuning as it is now will be obsolete, pre-ChatGPT internet scrapes will be worth their weight in gold. Once a block is hit on what we can do with data, the data itself is the next target.
If an x-ray means different things based off the race or gender we should make sure the model knows the race and gender.