My monthly spend on ai models is < $1
I'm not cheap, just ahead of the curve. With the collapse in inference cost, everything will be this eventually
I'll basically do
$ man tool | <how do I do this with the tool>
or even $ cat source | <find the flags and give me some documentation on how to use this>
Things I used to do intensively I now do lazily.I've even made a IEITYuan/Yuan-embedding-2.0-en database of my manpages with chroma and then I can just ask my local documentation how I do something conceptually, get the man pages, inject them into local qwen context window using my mansnip llm preprocessor, forward the prompt and then get usable real results.
In practice it's this:
$ what-man "some obscure question about nfs"
...chug chug chug (about 5 seconds)...
<answer with citations back to the doc pages>
Essentially I'm not asking the models to think, just do NLP and process text. They can do that really reliably.It helps combat a frequent tendency for documentation authors to bury the most common and useful flags deep in the documentation and lead with those that were most challenging or interesting to program instead.
I understand the inclination it's just not all that helpful for me
Extremely sluggish on non-Chrome. Starts with a black blank empty page. Fans spinning. Takes way too long to load for just some text and some videos. Clicking a link does some SPA magic that takes me to another black blank page, and takes ages to load. Clicking back doesn't work anymore. I need to reload the entire page, again blank and waiting. Once done loading, scrolling is extremely sluggish.
Yes, there are probably some interactive widgets in there, but all that and much more has been done without bogging down the browser like you're running a 3D game on WebGL.
Oh, and of course reader mode doesn't work.
It does make sense, if imagine pressing through in 5 seconds vs 30 seconds, that the paper filtration would work better in the slower press. But I'm not sure if anyone has scientifically measured this.
Actually wait, it's coffee. Someone has definitely scientifically measured it and probably published a two hour YouTube video with their results.
I know how to make an SD LoRA, and use it. I've known how to do that for 2 years. So what's the big secret about LLM LoRA?
For every problem you can't solve, there's a simpler problem that you also can't solve.
However IMO, there's still a large gap for businesses in going from raw OCR outputs —> document processing deployed in prod for mission-critical use cases. LLMs and VLMs aren't magic, and anyone who goes in expecting 100% automation is in for a surprise.
You still need to build and label datasets, orchestrate pipelines (classify -> split -> extract), detect uncertainty and correct with human-in-the-loop, fine-tune, and a lot more. You can certainly get close to full automation over time, but it's going to take time and effort. But the future is on the horizon!
Disclaimer: I started a LLM doc processing company to help companies solve problems in this space (https://extend.app/)
If I'm correct that that's part of the thrust of the article (and I may not be), then I definitely agree with the author. The first time I tried to use Obsidian I burned out because I went all-in on the bi-di linking, tagging, knowledge graph, etc., and it quickly killed my motivation. Now I just dump text in and rely on search to find what I need, only adding links in retrospect once they are needed, and now I actually use it and get value from it.