AI can diagnose childhood autism from retinal photos

While there's prior associations between retinal photos and autism, I'm by default very skeptical of any AI algorithm purporting "100% accuracy". It smells like data leakage.

I would bet that even physicians aren't 100% consistent in their diagnosis of autism. If that's the case, then it should be more or less impossible for any other diagnostic approach to be 100% consistent with the physician diagnoses.

Edit: After reading the study closer, this criticism might be a bit harsh. In their autism subjects, they excluded those with mild/moderate autism. Limiting to severe cases should mean there's a higher degree of confidence/consistency in the diagnoses.

karaterobot · 2 years ago

I'm not actually sure where the Petapixel article authors are getting the phrase "100% accuracy", as it does not show up in the article they are writing about, nor does it appear in the only other article they link to. Putting it in quotes makes it seem like a claim the model creators are making about their model, but I don't see them making that claim in general. They say the model matched all the sample data in this case, not that it's 100% accurate—presumably for the same reasons you are hesitant to do so. Unless I'm missing something, Petapixel should correct their headline.

reqo · 2 years ago

You can see the results if you download the pdf from [0].

[0] https://jamanetwork.com/journals/jamanetworkopen/fullarticle...

Eisenstein · 2 years ago

> I would bet that even physicians aren't 100% consistent in their diagnosis of autism.

That's because autism is diagnosed by using the DSM. You can take an x-ray of an arm and see the fracture, but in order to diagnose autism you have to determine 'persistent deficits in social communication and social interaction across multiple contexts'.

It is all dependent on how society defines things, and is fluid (and IMO, somewhat dubious).

tastyfreeze · 2 years ago

The whole DSM is dubious. Nothing more than a made up tool to pathologize normality and sell more drugs.

https://youtu.be/6JPgpasgueQ?si=dn3muYeOe-cSSKM2

Deleted Comment

I try not to immediately call BS on these types of studies…but in this case there are some concerns.

“The data sets were randomly divided into training (85%) and test (15%) sets. We used 10-fold cross-validation to obtain generalized results of model performance. Data splitting was performed at the participant level and stratified based on the outcome variables. Because the data classes were imbalanced for symptom severity (ADOS-2 and SRS-2), we performed a random undersampling of the data at the participant level before conducting data splitting. Moreover, we examined different split ratios (80:20 and 90:10) to assess the robustness and consistency of the predictive performances across diverse splitting proportions.”

* undersampling is problematic here and probably introduced some bias. These imbalanced class problems are just plain hard. Claiming one hundred percent on an imbalanced class problem should probably cause some concern. * data split at the participant level has to be done really careful or you’ll over fit * multiple comparisons bias by testing multiple split ratios on the same test data. Same with the 10-fold cross Val. * not sure if they validated results on any external test data * outcome variable stratification also has to be done really carefully or it will introduce bias; seems particularly sensitive in this case * using severity of symptoms as class labels is problematic. These have to really have been diagnosed the same way / consistently to be meaningful.

I also note a long time history in collection of these images (15 years iirc). Hard to believe such a diverse set of images (collection, equipment etc) led to perfect results.

ML issues aside, super interested in the basic medical concept. I wasn’t aware retinal abnormalities could be indicative of issues like ASD.

Closi · 2 years ago

Another potential issue:

> The photography sessions for patients with ASD took place in a space dedicated to their needs, distinct from a general ophthalmology examination room. This space was designed to be warm and welcoming, thus creating a familiar environment for patients. Retinal photographs of typically developing (TD) individuals were obtained in a general ophthalmology examination room. Each eye required an average of 10–30 s for photography, although some cases involved longer periods to help the patient calm down, sometimes exceeding 5–10 min. All images were captured in a dark room to optimize their quality. Retinal photographs of both patients with ASD and TD were obtained using non-mydriatic fundus cameras, including EIDON (iCare), Nonmyd 7 (Kowa), TRC-NW8 (Topcon), and Visucam NM/FA (Carl Zeiss Meditec).

So two questions:

1. Are we positive that the difference in rooms does not effect these images?

2. If we are in a dark room, and ASD patients are in it for 5-10 minutes longer, are we sure this doesn't effect the retina?

3. Were all cameras used for both ASD and TD images?

Want to make sure the AI is being trained to detect autism, and wasn't accidentally trained to identify camera models, length-in-dark-room or room-welcomingness.

Hopefully not, but I assume you have to be so careful with these sort of things when the model is entirely black-box and you can't actually validate what it's actually doing inside.

femto113 · 2 years ago

This is definitely worthy of concern. There's an infamous case where an AI was trained to detect cancer from imaging but all the positive examples included a ruler (to measure the tumor) so it turned out it just was good at detecting rulers. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9674813/#:~:tex....

jiggawatts · 2 years ago

If they consistently captured the images in different settings, then I guarantee you that that’s what the AI learned.

Just being in a dark room longer is sufficient to make changes that an AI could pick up on.

ajb · 2 years ago

Darn, was excited for a minute. This sort of experiment needs double blinding.

Ideally, they should capture the images from children before diagnosis, then see if they can predict the diagnosis.

adonovan · 2 years ago

Reminds me of the classic apocryphal early ML story of the enemy tank detector that was 100% accurate at identifying camouflaged tanks… so long as tanks and sunny weather were perfectly correlated in the input data, just as they were in the training data.

shubb · 2 years ago

It appears they also report good results for predicting symptom severity. It's less obvious how the cameras etc would leak into severity. Unless it actually works (it does seem a bit too good to be true), I'm thinking the test set was in the base model or something

getwiththeprog · 2 years ago

You are in a desert, you see a turtle on it's back. What do you do?

yread · 2 years ago

if we can diagnose autism by measuring how long it takes to take a picture isn't that even better?

dataangel · 2 years ago

Came here to say this. 100% is too good to be true and it's almost certainly the AI has figured out a signal leak from the camera, image format, room, etc.

bookofjoe · 2 years ago

Also concussions according to the article, which is news to this retired former neurosurgical anesthesiologist.(38 years in practice; stopped 2015 at age 67 because I believed [still do]) it's better to retire [from my profession, at least] too early than too late.

Deleted Comment

alach11 · 2 years ago

n-e-w · 2 years ago

etothepii · 2 years ago

There's a famous story (probably apocryphal) about the military of a powerful nation training an early AI to find pictures of submarines beneath the sea.

There was great excitement as it was near 100%.

It later transpired the pictures with submarines in had a white border.

MrMember · 2 years ago

Ha, I've heard a similar story about diagnosing skin cancer from pictures of moles. They were really excited about the performance of the model but it turned out if the dermatologist was concerned about the size of a mole they would include a ruler in the picture to document the size. The NN wasn't trained to "diagnose skin cancer" it was trained to recognize rulers in pictures.

sterlind · 2 years ago

haven't modern model architectures gotten better at avoiding this kind of overfitting? like obviously data quality is still very important, but my understanding is that dropout mitigates this by randomly cutting out these unwanted feature channels. the models learn to distinguish all differences, rather than just one, or fixed combinations of several.

daveguy · 2 years ago

It really doesn't have to do with most ML architectures. It has to do with experiment design. If some data used in testing is part of the training process there will be over fitting. That's why a final test set is required for unbiased evaluation.

dragonwriter · 2 years ago

> haven't modern model architectures gotten better at avoiding this kind of overfitting?

Overfitting is, AIUI, a training method and data issue, not a model issue alone. I doubt any model is resistant to overfitting if you give it data where the answer is reliably encoded some aspect it can use but outside of what you want it to look at.

Now, you can notice suspicious results and investigate (or you can just publish a 100% success rate and call it a day.)

morkalork · 2 years ago

I've heard this one before but with soviet tanks back in the 2000s!

erehweb · 2 years ago

May be an urban legend, according to this site: https://gwern.net/tank

sva_ · 2 years ago

I think there was something about an ANN detecting skin melanoma, assigning a high likelihood if there is a clinic/doctors office in the background.

theferalrobot · 2 years ago

Am I reading this wrong? They only had validation sets with no final test set which makes the results kinda worthless because we don't know how overfit they were to these validation sets (which can easily happen with any sort of parameter tuning). There is a reason why a proper study needs three splits train/validation (and possible multiple of these if you use k-fold) and a final to be used as sparingly as possible 'test' set.

See the paper, "On estimating model accuracy with repeated cross-validation"

Even the sklearn docs for cross validation show this split: https://scikit-learn.org/stable/_images/grid_search_cross_va...

wrsh07 · 2 years ago

This reminds me of the ml model that realized if an x-ray was taken at the center for cancer research it was likely to be an image of a cancer patient

RecycledEle · 2 years ago

I do not believe they achieved 100% accuracy because the original diagnoses could not have been 100% accurate.

hereme888 · 2 years ago

Good point. I wonder how severe the autism was to make sure the reference subjects were 'certainly' within the ASD.

Dead Comment

kossTKR · 2 years ago

I wonder if physiognomy will come back as a field, if AI scans like these have any validity.

I remember stumbling upon multiple esoteric accounts on both Twitter and Tiktok with communities seemingly obsessed with characterising various psychological traits from purely looking at facial features, importantly without racial undertones.

While this on the surface sounds ridiculous and has various horrible historical echoes, i've always had a hunch there was actually something to this science from a purely intuitive perspective and knowing lots of people - again very importantly disregarding anything about race - instead focusing on the myriad of hormone linked features, neurotypicalism, alcohol, environmental factors, whatever traits that seemingly somehow go "across races".

Or maybe there's nothing there.

therobot24 · 2 years ago

Anything with 100% accuracy is suspect, either via the model, the dataset, or the means of measuring