Without exception, every technical question I've ever asked an LLM that I know the answer to, has been substantially wrong in some fashion. This makes it just.. absolutely useless for research. In some cases I've spotted it straight up plagiarising from the original sources, with random capitalisation giving it away
The issue is that once you get even slightly into a niche, they fall apart because the training data just doesn't exist. But they don't say "sorry there's insufficient training data to give you an answer", they just make shit up and state it as confidently incorrect
I have been impressed by its results.
I think this fact stems more from its initial search phase than its pure LLM processing power, but to me it seems the approach works really well.
On a tangent, nice to see Plasmidsaurus using Emu [1], which has been shown to work great for 16S ribosomal RNA analysis on ONT by basically everyone I've heard who tried it. It has a nice algorithm for predicting if variants are due to ONT sequencing errors or are true variants, based on an expectation maximization algorithm, and thus working around the somewhat limited accuracy in ONT reads. Pretty clever stuff.
And if you want to run your own analysis on the raw data using Emu, you might want to try out our Trana pipeline built around Emu in Nextflow [2]. Apart from running Emu, it does some of the preprocessing like filtering, as well as exporting as Krona diagrams etc.
We're just putting it through validation at the clinical microbiology lab at Karolinska here in Stockholm right now.
The main caveat worth mentioning is that the choice of database seems to be able to affect results quite a lot in some cases.
What kind of projects is this software used for?
I got much better results, though still not perfect, with the voice isolator in ElevenLabs.
In the end though, it mostly just feels enough of a separate universe to any other language or ecosystem I'm using for projects that there's a clear threshold for bringing it in.
If there was a really strong prolog implementation with a great community and ecosystem around, in say Python or Go, that would be killer. I know there are some implementations, but the ones I've looked into seem to be either not very full-blown in their Prolog support, or have close to non-existent usage.
(Slams the door angrily)
(stomps out angrily)
(touches the grass angrily)