Microsoft's new Dragon Copilot is an AI assistant for healthcare

I’ve been beta testing this for several months. It’s OK. The notes it generates are too verbose for most medical notes even with all the customization enabled. Most medical interviews jump around chronologically and Dragon Copilot does a poor job of organizing that, which means I had to go back and edit my note which kind of defeated the purpose of the app in the first place.

It does a really good job with recognizing medications though, which most-patients butcher the name on.

Hallucinations are present, but usually they’re pretty minor (screwing up gender, years).

It doesn’t really seem to understand what the most important part of the conversation is, it treats all the information equally as important when that’s not really the case. So you end up with long text of useless information that the patient thought was useful but not at all relevant to their current presentation. That’s where having an actual physician is useful to parse through what is important or not.

At baseline it doesn’t take me long to write a note so it really wasn’t saving me that much more time.

What I do use it for is recording the conversation and then referencing back to it when I’m writing the note. Useful to “jog my memory” in a structured format.

I have to put a disclaimer in my note saying that I was using it. I also have to let the patient know upfront that the conversation is getting recorded and I’m testing something for Microsoft, etc. etc. You can tell who the programmer patients are because they immediately ask if it’s “copilot“ lol

amoxichillin · 6 months ago

I've been helping test it as well - your experience sounds identical to mine. I was initially very excited for it, but nowadays I don't really bother turning it on unless I feel the conversation will be a long one. Although I am very much looking forward to them rolling out the automated pending of orders based on what was said during the conversation.

LLM's have so much potential in medicine, and I think one of the most important applications they will have is the ability to ingest a patient's medical chart within their context and present key information to clinicians that would've otherwise been overlooked in the bloated mess that most EMR's are nowadays (including Epic).

There's been so many times where I've found critically important details hidden away as a sidenote in some lab/path note overlooked for years that very likely could've been picked up by an LLM. Just a recent example - a patient with repeated admissions over the years due to severe anemia, would usually be scoped and/or given a transfusion without much further workup and discharged once Hgb >7. Blood bank path note from 10 years ago mentions presence of warm autoantibodies as a sidenote; for some reason the diagnosis of AIHA is never mentioned nor carried forward in their chart. A few missed words which would've saved millions of dollars in prolonged admissions and diagnostic costs over the years.

skywhopper · 6 months ago

Given everything I hear about LLMs for similar summary purposes including your description and that given above, it seems unlikely that the LLM would be all that likely to “notice” a side note in a huge chart. I agree that’d be great but I’m curious why you think it would necessarily pick up on that sort of thing.

marcellus23 · 6 months ago

> A few missed words which would've saved millions of dollars in prolonged admissions and diagnostic costs over the years.

I don't mean to come off antagonistic here. But surely the more important benefit is the patient who would've avoided years of sickness and repeated hospital visits?

stuartjohnson12 · 6 months ago

I just wanted to jump in and say - don't give them too much credit on transcribing medication, I'm guessing this is Deepgram behind the scenes and their medication transcription works pretty well out of the box in my experience.

voidUpdate · 6 months ago

Screwing up gender and years sounds pretty serious to me?

beng-nl · 6 months ago

Maybe they mean that it either doesn’t matter in context or it’s easy to catch and correct. Either way it seems reasonable to trust the judgement of the professional reporting on their experience with a new tool.

dumbmrblah · 6 months ago

It's more in scenarios where I enter the room and I ask the patient whether this is their wife/husband etc. It's not like I'm going into the room and saying "hello patient you appear to be a human female". The model is having difficulty figuring out who actors are if their are multiple different people talking. Not a big issue if all you're doing is rewriting information. But if multi-modal context is required, its not the best.

userbinator · 6 months ago

The notes it generates are too verbose for most medical notes even with all the customization enabled.

I've noticed that seems to be a common trend for any AI-generated text in general.

TeMPOraL · 6 months ago

I think this might be because of what GP said later:

> it treats all the information equally as important when that’s not really the case

In the general case (and I imagine, in the specific case of GP), the model doesn't have any prior to weigh the content - people usually just prompt it with "summarize this for me please <pasted link or text>"[0], without telling it what to focus on. And, more importantly, you probably have some extra preferences that aren't consciously expressed - the overall situational context, your particular ideas, etc. translate to a different weighing that the model has, and you can't communicate that via the prompt.

Without a more specific prior, the model has to treat every information equally, and this also means erring on the side of verbosity, as to not omit anything the user may care about.

[0] - Or such prompt is hidden in the "AI summarizer" feature of some tool.

gmerc · 6 months ago

Are they charging per token

hakaneskici · 6 months ago

Same for AI coding assistants, most tools generate way too much unnecessary code. Scary part is that the code seems to be running OK.

ksaxena · 6 months ago

Yes, the biggest problem with Healthcare AI assistants right now is that there is no way to "prompt" the AI on what a physician needs in a given scenario - eg. "only include medically relevant information in HPI", "don't give me a layman explanation of radiographic reports", "include direct patient quotes when a neurological symptom is being described" etc.

And the prompt landscape in the field is vast. And fascinating. Every specialist has their own preference for what is important to include in a note vs what should be excluded; and this preference changes by disease - what a neurologist want in an epilepsy note is very different from what they need in a dementia note for eg.

Note preferences also change widely between physicians, even in the same practice and same specialty! I'm the founder of Marvix AI (www.marvixapp.ai), an AI assistant for specialty care, we work with several small specialty care practices where every physician has their own preferences on which details they want to retain in their note.

But if you can get the prompts to really align with a physician's preferences, this tech is magical - physicians regularly confess to us that this tech saves them ~2 hours every day. We have now had half a dozen physicians tell us in their feedback calls that their wives asked them to communicate their 'thanks' to us for getting their husbands back home for dinner on an important occasion!

[Edit: typo and phrasing]

visarga · 6 months ago

> there is no way to "prompt" the AI on what a physician needs in a given scenario - eg. "only include medically relevant information in HPI", "don't give me a layman explanation of radiographic reports", "include direct patient quotes when a neurological symptom is being described" etc.

There is, it is called RLHF.

burnte · 6 months ago

We tried it at my job, I got us in the beta. Go try Nudge AI and tell me what you think. Our providers found Nudge to be a far better product at a fifth of the price.

monkeydreams · 6 months ago

> Hallucinations are present, but usually they’re pretty minor (screwing up gender, years).

And if all hospitals were doing was having doctors treat patients, this would be ok. But healthcare is fueled by these "minor" details and this will result in delays in payment and reimbursent, trouble with patient identification, corruption of clinical coding, etc.

pintxo · 6 months ago

Did you encounter any instances of hallucinations or omissions?

One would image those to be the biggest dangers.

dumbmrblah · 6 months ago

Hallucinations are pretty minimal but present. Some lazy physicians are gonna get burned by thinking they can just zone out during the interview and let this do all the work.

I edited my original post. Omissions are less worrisome, it’s more about too much information being captured which isn’t relevant. So you get these super long notes and it’s hard to separate the “wheat from the chaff”.

knowitnone · 6 months ago

it's not minor when they screw up dosage

Dead Comment

As a medical student, I used the dragon dictation software (no AI) to write notes in the ED and more recently I used a pilot of this ai version to write clinic notes.

Overall, I was quite impressed. It definitely made writing notes much faster, which all doctors hate to do. While it had some problems with where to put key pieces of information (like putting details from the physical exam back in the history), it only took 5 mins of rearrangement after the visit to complete the note.

For simple diagnoses, it does a decent job coming up with the assessment and plan, probably because all the simple diagnoses were in the training set. For more complex ones though, it needs to be exactly dictated by the doctor. I can see this being used very well in primary care.

Edit: When I said “coming up with an assessment and plan” I mean documenting the assessment and plan based on the ai’s recorded conversation with the patient. The conversation with the patient is meant to be understandable. The “assessment and plan” documentation on the other hand is jargony and meant to be read by other physicians.

conartist6 · 6 months ago

This still sounds bad. 5 mins to rework your notes after each patient visit? I didn't assume doctors had that kind of time.

And let me make this clear. I, as your patient, I never NEVER want the AI's treatment plan. If you aren't capable of thinking with your own brain, I have no desire to trust you with my health, just like I would never "trust" an AI to do any technical job I was personally responsible for due to the fact that it doesn't care at all if it causes a disaster. It's just stochastic word picker. YOU are a doctor.

diggan · 6 months ago

> This still sounds bad. 5 mins to rework your notes after each patient visit? I didn't assume doctors had that kind of time.

Compared to what though? It reads as not additional work, but less work than manually having to do all that, seems likely to needing more than 5 minutes.

> And let me make this clear. I, as your patient, I never NEVER want the AI's treatment plan.

Where are you getting this from? Neither the parent's comment nor the article talks about the AI assistant coming up with a treatment plan, and it seems to be all about voice-dictating and "ambient listening" with the goal of "free clinicians from much of the administrative burden of healthcare", so seems a bit needlessly antagonistic.

ilikecakeandpie · 6 months ago

> 5 mins to rework your notes after each patient visit? I didn't assume doctors had that kind of time.

I worked in a healthcare for over a decade (actually for a company that Nuance acquired previous to their acquisition) and the previous workflow was they'd pick up a phone, call a number, say all their notes, and then have to revisit their transcription to make sure it was accurate. Surgeons in particular have to spend a ton of time on documentation

eig · 6 months ago

I think you may be misunderstanding how the tool is used (at least the version I used).

The doctor talks to the patient, does an exam, then formulates and discusses the plan with the patient. The whole conversation is recorded and converted to a note after the patient has left the room.

The diagnosis and plan was already worked out while talking to the patient. The ai has to convert that conversation into a note. The ai cant influence the plan because the plan was already discussed and the patient is gone.

zeagle · 6 months ago

AI is an assistive tool at best but it can probably speed up by reflowing text. I use dragon dictation with one of the Philips microphones and it makes enough mistakes that I would probably spend the same time editing/proofing. Had a good example yesterday where it missed a key NOT in an impression.

As aside, the after work is what burns out physicians. There is time after the visit to do a note, 5 min for a very simple is reasonable to create dictate fax do the work flow for billing and request a follow up within a given system. A new consult might take 10 min between visits if you have time.

For after hours, ER is in my opinion a bad example because when you are done, you are done.

Take a chronic disease speciality or GP and it is hours of paperwork after clinic to finish notes (worse if teaching students), triage referrals, deal with patient phone calls that came in, deal with results and act in them, read faxes etc. I saw my last patient ~430 yesterday and left for home at 7 dealing with notes and stuff that came in since Thursday night.

bpodgursky · 6 months ago

> I, as your patient, I never NEVER want the AI's treatment plan.

You as a patient are going to get an AI treatment plan. Come to peace with it.

You may have some mild input as to whether it's laundered through a doctor, packaged software, a SaaS, or LLM generated clinical guidelines... but you're not escaping an AI guiding the show. Sorry.

_qua · 6 months ago

You'd be horrified to learn how many doctors spend hours at the end of their day finishing notes on patients. It's a nightmare.

Ukv · 6 months ago

> And let me make this clear. I, as your patient, I never NEVER want the AI's treatment plan. If you aren't capable of thinking with your own brain, I have no desire to trust you with my health,

To my understanding this tool is for transcription/summarization, replacing administrative work rather than any critical decision making.

> just like I would never "trust" an AI to do any technical job

I'd trust a model (whether machine-learning or traditional) to the degree of its measured accuracy on the given task. If some deep neural network for tumor detection/classification has been independently verified as having higher recall/precision than the human baseline, then I have no real issue with it. I don't see the sense in having a seemingly absolute rejection ("never NEVER").