Deep learning on electronic medical records is doomed to fail

Having worked with data from EMR systems and having worked at a large EMR software development shop myself, and now using deep learning at work quite a bit for the past few years, I'm inclined to agree.

This title is somewhat click bait though, because the fault is really with EMR systems and (esp) the American Healthcare system, not deep learning.

The entire system is designed around billing and decisions are made my hospital and insurance executives that are generally not technical. There is no incentive to clean up the system or work on a well structured open protocol for interop the same way there is in say banking. Plus, the author gives some good examples like pulse ox%, doctors and nurses are not at all concerned with or trained to record data in a way that makes sense to use programmatically. They're typically thinking only as if they're recording it for another human to read.

Deep learning could probably be quite useful in the medical field, but we won't know until someone comes along and disrupts the system top to bottom similar to how Tesla has done with not only manufacturing, but the sales process and shirking the dealership model. This would probably look something like Forward[1], but with a crazy amount of funding, so that insurance companies and billing codes could be ignored entirely.

[1]: https://goforward.com/p/home

ulkesh · 4 years ago

Yes, exactly. Hospital data is atrocious, on every single level. Duplicate data are everywhere across multiple systems. Hospitals move excruciatingly slow to do anything technical. And, very few people seem to ever have any real understanding of what is going on with their systems, they tend to only know the user interface and have to rely on their vendor support (Epic, Cerner, etc) for anything beyond that.

I work for a company that I was hoping would be such a disruption point for hospitals (at least in some small way), but instead they decided that it's just too difficult to get hospitals to do much of anything, so we effectively knuckle under -- creating numerous integration points, making more and more copies of data.

The only way this will change is if a large enough player creates direct competition in the EHR/EMR business, with these kinds of data-oriented models in mind, all the while creating a system that, top-to-bottom, is better for both the user, the administrator, and the technical staff. Current players in this space have very little incentive to make their products better. And it reminds me of a quote from Tron Legacy: "Given the prices we charge to students and schools, what sort of improvements have been made in Flynn... I mean, um, ENCOM OS-12?"..."This year we put a "12" on the box."

nradov · 4 years ago

It would take billions of dollars and many years to build a new general purpose EHR from scratch. This isn't an industry where a start-up can launch an MVP and get sales to early adopters looking for a competitive advantage. Every major provider organization already has an EHR, and usually an ONC certified one. A new offering has to be at least as good as existing products in every way in order to get any sales. It has to be a 100% solution out of the gate.

Outside of general purpose comprehensive EHRs there is still opportunity for new niche market offerings targeting limited medical specialties or types of facilities. For example if you wanted to target, let's say, just dialysis centers or just psychiatrists then the hurdle would be much lower.

divbzero · 4 years ago

Part of the challenge might be the limited number of individuals with expertise in both medicine and technology. Each of those fields are quite specialized and require time to build an understanding of the subject matter.

not2b · 4 years ago

"This title is somewhat click bait though, because the fault is really with EMR systems".

The title is entirely accurate: it specifies "on electronic medical records" and effectively demonstrates the thesis in the title. And yes, you're right that the problem is in the data. No matter, it's still true.

Everything with an even mildly provocative title is accused of being "clickbait" on HN. I wish people would reserve that criticism for actively deceptive titles, or titles that promise something that is not delivered.

modeless · 4 years ago

> Plus, the author gives some good examples like pulse ox%, doctors and nurses are not at all concerned with or trained to record data in a way that makes sense to use programmatically. They're typically thinking only as if they're recording it for another human to read.

The unstructured doctor's notes may be where the best signal lies. Thet should be mostly uncorrupted by the billing/process related trash. Deep learning does not need data to be formatted to be read "programmatically", in fact it shines most on data that isn't.

midjji · 4 years ago

Deep learning requires very well curated and balanced datasets, at best if your test data is also poorly curated and similarly unbalanced that might hide that it isn't working from you.

yarky · 4 years ago

> The unstructured doctor's notes may be where the best signal lies.

Or where the raw data is corrupted ...

toss1 · 4 years ago

>>This title is somewhat click bait though, because the fault is really with EMR systems and (esp) the American Healthcare system, not deep learning.

Right

Data-mush is the cause

That's my official term for it. Data that looks fantastic in format, but just the slightest peek under the covers reveals that every data-entry person is either freelancing and entering whatever is easiest/sorta-makes-sense for them, or under pressure to skew things, or the definitions of each code (and the cases where it does/doesn't apply) are ill-defined, or etc. etc. etc. and on top of this, we have the medical system insurer-provider adversarial relationship covered so well in the article.

The result is a toxic brew of definition drift, unintentional errors, and intentional errors. It is not just the fringes and sub-one-percent of the edge cases, it is rotten to the core.

Basically the entire data set is a complete lie.

You may have the most perfect AI, but it literally has no chance against that toxic data swamp.

Once again, the only winning move is to not play.

divbzero · 4 years ago

Wouldn’t heath systems like Kaiser Permanente that are fully integrated (both insurer and provider) have the right incentives to disrupt their own processes in favor of something better?

nradov · 4 years ago

What sort of incentive were you thinking of? KP is one of the largest Epic customers. They don't have claims as such, but still have similar internal processes to prevent waste and determine how much to charge their customers.

midjji · 4 years ago

Having worked with otherwise decent Swedish public healthcare systems, I can't tell you if its as bad, just that it is absolutely awful. I would never think to try and use anything they provided as basis for a dataset.

BeFlatXIII · 4 years ago

> doctors and nurses are not at all concerned with or trained to record data in a way that makes sense to use programmatically. They're typically thinking only as if they're recording it for another human to read.

The flip side of this is that data entered for programmatic reading often isn't very useful for future humans to review. Automatically-generated doctor's notes obscure information in a way that "HEART ATTACK" circled in red ink on a paper chart does not.

Deleted Comment

nradov · 4 years ago

Implementation guides built on HL7 FHIR are well structured open protocols for interoperability. Several of the largest EHR vendors actively participate in defining those standards, and have built them into products.

Of course that doesn't solve the garbage in / garbage out problem. If users configure the software incorrectly or don't enter the right data then then you don't have much to work with.

In some cases NLP technology can be used to convert unstructured chart notes into coded clinical concepts. That can work well enough for research and analytics, but isn't accurate enough to use for direct patient care without human review.

shakna · 4 years ago

HL7. Well-structured. Pick one. The HL7 spec might be a standard, with a specification, but it is not a thing for well-structured data.

For example, FHIR allows you to use a standardised shorthand, with a formal grammar. [0] That grammar is well defined... It also happens not be a regular grammar, and it is entirely possible to construct something that will be both valid and undecidable, ala Perl. Infinite parsing and expansion on finite data.

[0] https://hl7.org/fhir/uv/shorthand/STU1/reference.html#append...

specialist · 4 years ago

I designed, implemented, and supported 5 regional exchanges. Interop is always case-by-case.

Protocols like HL7 (2.x, 3.x, FHIR) give you shared syntax (lexing) and about 98% agreement on structure and semantics (parsing). Then you'll spend most time on the last 2% on excruciating details. Data quality on field values, matching units, mapping one taxonomy to another (and back).

> If users configure the software incorrectly...

Based on my experience, incompatibility is a given. Regardless of the cause. Misconfiguration, mismatched versions, ignorance, artistic license, etc.

Plan for the worse, hope for the best.

Deleted Comment

pkaye · 4 years ago

So does that mean it might be of use to non American healthcare systems which have less to deal with billing?

The first point he raises is the most critical by far. The silverbacks of the industry deliberately stymie efforts for true interoperability because it goes directly against their primary goal, which is forcing everyone into their platform. Epic in particular has zero intention of allowing anyone else to take their market share by enabling easy sharing of data across platforms. It's far better for them from a business perspective to make interfacing so unreasonably difficult that you are forced to implement their full suite of applications, at which point they hold your organization's data hostage to induce other orgs to do the same. The larger their ecosystem grows, the less they need to worry about interoperability - improving patients' outcomes is not even an afterthought. Their vision of population health reporting is one in which every major healthcare org has been trapped inside their walled garden.

z3ugma · 4 years ago

Some free advice from an ex-Epic: This is true when it's other vendors doing the data fetching. When it's a health system customer of Epic, they bend over backwards to help them extract the data properly and build cool clinical tools on top of the Epic platform. Health systems with big innovation arms like Atrium and Providence could be a good place to seek VC if your product idea relies on deep EMR access. Sometimes the left hand doesn't talk to the right in these health systems though - you'll need to get that innovation arm talking to the EMR analysts. Use the shibboleth "I want to talk to our Epic TS" for whatever speciality you work on.

As for things like the App Orchard and Epic on FHIR https://fhir.epic.com/ : Epic is smart enough to realize that their future lies as the platform of the health system IT stack, in the Ben Thompson sense of a platform / aggregator.

The hospitals are scared of open access, and Epic always does what's in the best interest of their customers, so they push against open access.

bearjaws · 4 years ago

Epic on FHIR is missing a ton of data that is present in Epic, for example if I want to react to a dispensing event, that has to be custom built as far as I know. Even then, you're long polling an endpoint hoping you find work you need to react to.

At times it feels like it would be better to just get data straight from the database, as custom Epic implementation times are insanely long and costly.

nimish · 4 years ago

Ehh, Epic cares about Epic. They lose their gatekeeping ability once TEFCA roles out with teeth.

The P in HIPAA stands for portability, of course.

bduerst · 4 years ago

Yeah, ex-Epic here as well. Epic's data surfaces are byzantine almost by design, as a form of security or safety. It can only be construed as gatekeeping in that Epic has never prioritized making it better for non-customers.

colinmhayes · 4 years ago

Epic controls 55% of the EMR market, and that number is only growing. It won't be long until this isn't a problem because the majority of the population has all their records with Epic.

paulmd · 4 years ago

Yes, EHR vendors are really opponents/gatekeepers here. They don't benefit from you getting your data out of the EHR and in my experience they weren't really open to it.

Interoperability is really driven by either state/federal reporting requirements, or billing. And there is no incentive for EHR vendors or their client hospital systems to go beyond the exact minimum to get paid.

nradov · 4 years ago

Epic has an extensive set of interoperability APIs, mostly based on open industry standards, which enable easy sharing of data across platforms. Is there something specific missing?

https://open.epic.com/

pc86 · 4 years ago

Pardon my bluntness but this is a bullshit brochure website.

I've worked with non-health system healthcare companies that have tried to work with Epic on interoperability. They are outright hostile to anyone being able to access any data in their systems unless it's the hospital customer (and even then they're not exactly helpful).

ZeroCool2u · 4 years ago

AutumnCurtain · 4 years ago

boas · 4 years ago

> Life would be simpler if only these hospitals could set aside their arrogance and just go with the recommended workflow!

This would be like asking programmers to standardize on the recommended programming language.

we would love to just use the recommended workflow, if it worked for our hospital. There are differences in the patients, doctors, local regulations, existing systems, etc between hospitals.

Patients: Top cancer hospital does a lot of clinical trials, so some of the forms require you to fill out clinical trial information for every patient. In a maternity ward, it would not be appropriate to ask about clinical trials for every patient.

Doctors: Hospitals are staffed differently. If the hospital has residents, some of the work can be delegated to residents. If not, someone else has to do it. The workflow needs to account for who is actually available to do the work.

Local regulations: Medicine is highly regulated, and each state and hospital has its own rules.

Existing systems: Hospital computer systems have been around for decades, and usually it's not possible to migrate everything to a new system, so the new system needs to integrate with the old systems that couldn't be upgraded.

tclancy · 4 years ago

I think the rest of the paragraph clarifies they are joking.

idoh · 4 years ago

I happen to know a lot of doctors, including, as an example, an OBGYN. As it was explained to me, for vaginal births:

- at some point someone, without evidence, speculated that cervix dilation should proceed along some curve

- cervix dilation is actually measured by hand - literally inserting fingers and having the doctor practice "so many fingers = so many centimeters". There's plastic sheets with holes in them so they can practice measuring the size of holes with fingers.

- the OBGYN knows that the cervix dilation curve should look like, and kinda sorta maps their hand readings to what it should look like

- the OBGYN has a general sense as to how labor is going, and will game the cervix dilation stats to match their expectation, e.g. if labor is going well but the cervix hasn't dilated then they'll kinda sorta report progress anyway

Anyway, given the above it seems like the data around cervix dilation is suspect - the measurements are fitted to what the curve should look like, and then the data matches the curve, and that makes people more confident in the curve, and so on.

The point is, can you really apply ML to the EMR of cervix dilation? Does it make sense, could you really draw conclusions from this?

YeGoblynQueenne · 4 years ago

>> The point is, can you really apply ML to the EMR of cervix dilation?

Yes, it's a perfect fit. If I may.

Sorry, to clarify, the way most people do machine learning is what you describe: tweak a model until it fits the dataset. If the ability of the model to fit the dataset translates to anything beyond that, it's anybody's guess.

You just put me off being a parent for life, btw.

Just to clarify, this OB will report incorrect clinical data to support how they feel labor is going?

calvano915 · 4 years ago

In cases of subjective data, you will always see variances in reporting that may be construed as misrepresentation. It's not at all easy. As "objective" as the example might seem, I'd argue the clinician has too much leeway to truly be objective. There's a whole ton of subjective data involved in every patient course of care, much more than objective in many cases.

In some situations yes, in others, if they read a 5.5 or a 6, then they will pick the one that fits.

cptaj · 4 years ago

>EMR software is widely hated by the nurses and doctors who have to use it. It’s slow, bloated, nonintuitive, requires workarounds, etc. etc. etc.. The root of this evil is that every hospital brings its own conceited and byzantine patchwork of procedures, checks, and rituals to the table.

You just described every admin software for every industry I've worked on.

The problem is individual orgs dictating software architecture. When each purchase is in the millions, you accommodate every whim no matter how absurd... and then you end up with these bloated, messy systems.

For software systems to REALLY, shockingly improve efficiency in an organization, all the processes in the org need to change to accommodate a new overarching system design. Tailoring software to mirror legacy processes defeats the purpose almost entirely.

I think there is a truly absurd competitive advantage in doing this right but you seldom see enough leverage to completely overhaul every department in order to implement software admin systems.

csours · 4 years ago

People don't hate Jira because of Jira, they hate the experience of using Jira with the rules that their organization has put into Jira. They hate the culture of how their org uses Jira.

justin_oaks · 4 years ago

There's a lot of truth to that, but Jira (or more accurately Atlassian) does things that are worthy of hate too.

- Some information is displayed in unexpected places

- The application can be slow to load

- Customizations are confusing (Do i need to edit the screen, the screen scheme, or the issue type screen scheme?)

- Confusing options (Field configurations have both a "Configure" action and an "Edit" action.)

- Basic features are often left out and require plugins to bridge the gap.

And much more!

ialyos · 4 years ago

The article is simply wrong.

I know this because I worked as an ML engineer at an extremely successful company that automated medical coding using deep learning.

The confusion stems from conflating a "perfect solution" with a "human augmented" one.

90% of coding cases are trivial, have low value and can be done by a model. 10% are really subtle and need human expertise.

That's fine. You can make a billion dollar company on low hanging fruit. I think it's best not to conflate the perfect solution with a very good solution.

javadocmd · 4 years ago

You've not refuted the article so much as pointed out a corner case the author didn't address in which ML is a good fit. Your example, using ML to perform the medical coding function, is using a data source (in this case the EMR) for one of the purposes for which it was explicitly designed and for which it is (arguably) non-deficient. That is a realm not doomed to failure.

The realm doomed to failure is using a data source for a completely oblique purpose for which it is horribly distorted. Namely, the purpose of optimizing individual and public health by discovering guidelines and treatments, diagnosing illness, and delivering optimal care.

(Of course medical billing as an enterprise shouldn't even exist, but that is another topic.)

Thanks for the nuance. I completely agree with how you've framed the situation.

sjg007 · 4 years ago

Medical coding is just billing right? You match doctor notes to ICD-10 codes. Seems reasonable.

Medical coding is mainly billing with ICD-10 codes for diagnoses and CPT + HCPCS codes for procedures. However, there is also non-billing clinical coding for things like LOINC, SNOMED CT, and RxNorm.

What was was the criterion that you were optimizing? Currently hospitals try to assign codes in a way that maximizes payouts from insurance companies while avoiding straight up lying in a way that could cause them problems. So they'll handle that 10% by choosing the codes with the bigger payout.

fnbr · 4 years ago

I agree with the conclusion. This is totally unsurprising to me as a ML engineer. If you put garbage data into the model, you get garbage predictions. That doesn’t strike me as particularly novel. The same is true for cooking, after all.

However- this has been truly shocking to all of the non-technical stakeholders I’ve worked with. They take the stance that any large amount of data can be used to do ML on, presumably because they don’t know too much about what doing ML is like.

So I’m convinced the author is right, and I’m also convinced that there will be many attempts to use ML on EMRs.

jerf · 4 years ago

There is the old quote that we've all seen: "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question." - Charles Babbage

I will say, I have some good news for the late great Charles Babbage. For the most part, people now do indeed understand that if you put small amounts of wrong figures into a machine, the wrong answers will come out. If nothing else, pocket calculators and math class have given them the direct experience of this.

However, it seems that people still expect that if you put gigabytes or petabytes worth of wrong figures into a machine that somehow the right answer will pop out.

Ah well. The road never ends, you know.

d1sxeyes · 4 years ago

> However, it seems that people still expect that if you put gigabytes or petabytes worth of wrong figures into a machine that somehow the right answer will pop out.

The interesting fact is that if you put in lots of correct figures, and only one, slightly wrong figure, then the answer may be correct to an acceptable degree.

For example, take 100 values, all exactly one, and find the mean. You'll find 1.

If you take 99 values of exactly one and one value of two, the mean of your sample will be 1.01, which is close enough to still be useful. In some interpretations, it may even be rounded to 1, meaning that in some circumstances, incorrect figures can indeed sometimes lead to correct answers.

Or if you're trying to find out what adding 1 and 4 gives you, but accidentally you add 2 and 3, you get the correct answer despite incorrect inputs.

I think people are assuming that if you put gigabytes or petabytes worth of data into a machine, the number of 'wrong figures' will be lost as noise.

Forgeties79 · 4 years ago

Garbage in -> Garbage out is basically a Newtonian law at this point haha

momenti · 4 years ago

That's not entirely true. Neural networks are fairly robust to noisy training data (a.k.a. garbage).[0] Well, stochastic gradient descent has the noise in its name. More training data can compensate for noisy data to some extent.[1] I'm not sure know if model size can also compensate for noisy data though, but would not be surprised if it did.

[0] https://arxiv.org/abs/1705.10694

[1] https://arxiv.org/abs/2202.01994

deltarholamda · 4 years ago

While a lot of the post is good info, there are some upsides, if we can get the EHR situation worked out.

Many years ago, prior to anything like ML, Canada figured out that cystic fibrosis patients whose weight is higher than 50th percentile, had significantly better lung function. Nobody really understood why, but the correlation was so strong (.85 or something like that) it could not be ignored. Treatment protocols for CF changed to encourage weight gain, and lifespan outcomes have steadily improved over the years.

What other oddball correlations are hiding in the depths of bloodwork, weight/height, etc. for patients? We've teased out all the easy ones, the ones that are left are combinations nobody thought to even measure.

Regarding the EHR debacle, I'm optimistic that something could be worked out as a standard and implemented across the board. Expensive? Sure, but it's an investment that pays off pretty quickly, I think.

We're never going to get every provider organization using the same EHR, nor would that even be desirable. But almost all of them have passed ONC Health IT certification so they have similar functionality, including exposing at least some data in open industry standard formats.