Readit News logoReadit News
dvorka commented on Ask HN: How are you doing RAG locally?    · Posted by u/tmaly
dvorka · 2 months ago
Any suggestion what to use as embeddings model runtime and semantic search in C++?
dvorka commented on I'm a laptop weirdo and that's why I like my new Framework 13   blog.matthewbrunelle.com/... · Posted by u/todsacerdoti
astra1701 · 3 months ago
The really special thing about Frameworks is that you can quickly buy and replace basically any part, not just the usual RAM and SSD -- case in point, when I managed to damage my FW13's keyboard such that it was no longer usable, I could just... go straight to Framework's website and buy a new one for $40. And, I even had the option of a slightly improved one, that shed the Windows key and lacked the god-awful copilot key.

This approach even allows the manufacturer to correct design flaws after the fact -- and let's face it, there will always be design flaws. For instance, my FW13 originally came with a very weak hinge for the screen. It was perfectly usable for most daily usage and most people probably wouldn't care, but it meant I couldn't hold it up without the screen tilting back. Well, FW corrected this for those customers who really did care by just selling a new hinge for $24, and so $24 + 10 minutes with a screwdriver later, I had a substantially more refined device! (And to clarify -- there was a defective hinge version in the early batches, and those were replaced free of charge. Mine was a slightly later version that, beyond lacking the level of stiffness I preferred, was not defective.)

dvorka · 3 months ago
It depends on vendor really - I have Lenovo T480 and I replaced keyboard earlier this year (there are various options like w/ or w/o backlit + layout (I'm Czech), I have 2 batteries - one for "normal" use and extended one (in size and capacity) for traveling, changing multiple SSDs and RAM is possible (not soldered)... it's not framework, but easily fixable and Linux friendly HW.
dvorka commented on Getting a Gemini API key is an exercise in frustration   ankursethi.com/blog/gemin... · Posted by u/speckx
dvorka · 3 months ago
This is so true! But the adventure doesn't end there. I have 2 billing accounts from the past when I was building projects on AppEngine. Annual exercise to keep them alive (even if no action is needed in the end) is of similar complexity. Why do I need these accounts? Because I want to use Google services for which I don't pay.
dvorka commented on A critical look at MCP   raz.sh/blog/2025-05-02_a_... · Posted by u/ablekh
dvorka · 10 months ago
"In the good old days, it was a good practice to run a new protocol proposal through some standards bodies like W3C or OASIS, which was mostly a useful exercise. Is the world somewhere else already, or would it be a waste of time?"
dvorka commented on Show HN: Reor – An AI note-taking app that runs models locally   github.com/reorproject/re... · Posted by u/samlhuillier
dvorka · 2 years ago
Rear is a really interesting project with admirable goals. I believe this is just the beginning, but you have already done a great job!

I have been working on my note-taking application (https://github.com/dvorka/mindforger) for some time and wanted to go in the same direction. However, I gave up (for now). I used ggerganov/llama.cpp to host LLM models locally on a CPU-only machine with 32GB RAM, and used them for both RAG and note-taking use cases (like https://www.mindforger.com/index-200.html#llm). However, it did not work well for me - the performance was poor (high hardware utilization, long response times, failures, and crashes) and the actual responses were rarely useful (off-topic and impractical responses, hallucinations). I tried llama-2 7B with 4b quantization and a couple of similar models. Although I'm not happy about it, I switched to an online commercial LLM because it performs really well in terms of response quality, speed, and affordability. I frequently use the integrated LLM in my note-taking app as it can be used for many things.

Anyway, Reor "only" uses the locally hosted LLM in the generation phase of the RAG, which is a nicely constraint use case. I believe that a really lightweight LLM - I'm thinking about a tiny base model fine-tuned for summarization - could be the way to go (fast, non-hallucinating). I'm really curious to know if you have any suggestions or if you will have any in the future!

As for the vector DB, considering the resource-related problems I mentioned earlier, I was thinking about something similar to facebookresearch/faiss, which, unlike LanceDB, is not a fully-fledged vector DB. Have you made any experiments with similarity search projects or vector DBs? I would be interested in the trade-offs similar to small/large/hosted LLMs.

Overall, I think that both RAG with my personal notes as a corpus and a locally hosted generic purpose LLM for the use cases I mentioned above can take personal note-taking apps to a new level. This is the way! ;)

Good luck with your project!

u/dvorka

KarmaCake day92June 8, 2018
About
Consistent hard work, day after day, week after week, year after year. No magic bullets, no shortcuts. — JoshCox
View Original