Readit News logoReadit News
gillesjacobs commented on Belgian CVD is deeply broken   devae.re/posts/belgian-cv... · Posted by u/piecrumpled
gillesjacobs · a month ago
Had many a friend in the Belgian hacker scene who were threatened with legal action after responsible disclosure. To my knowledge, these threats always remained empty: if there is one thing more expensive than engineering a fix, it is starting a lawsuit in Belgium.

It is a sad state-of-affairs that the culture is like this. Ultimately it results in a less secure society, where vulns are anonymously disclosed and shared.

gillesjacobs commented on MCP: An (Accidentally) Universal Plugin System   worksonmymachine.substack... · Posted by u/Stwerner
troupo · 2 months ago
So... How do MCPs magically unlock data behind proprietary databases and interfaces?
gillesjacobs · 2 months ago
It doesn't do it magically. The "tools" an LLM agent calls to create responses are typically REST APIs for these services.

Previously, many companies gated these APIs but with the MCP AI hype they are incentivized to expose what you can achieve with APIs through an agent service.

Incentives align here: user wants automations on data and actions on a service they are already using, company wants AI marketing, USP in automation features and still gets to control the output of the agent.

gillesjacobs commented on Show HN: ChatToSTL – AI text-to-CAD for 3D printing   huggingface.co/spaces/flo... · Posted by u/flowful
gillesjacobs · 2 months ago
Looks very cool!

I prototyped something like this with build123d for Python and Cursor + OCP VSCode plugin.

Build123d is too new with too little examples out there, unlike OpenSCAD. I can only get it to generate good code with largr reasoning models that access the latest docs. No fast iteration for build123d yet.

gillesjacobs commented on OpenEuroLLM   openeurollm.eu/... · Posted by u/richardfontana
transcriptase · 6 months ago
Will it only respond with a vacation message from June to September?
gillesjacobs · 6 months ago
The team is currently skiing for two weeks so we'll have to get back to you on that.
gillesjacobs commented on Open-Sourcing R1 1776   perplexity.ai/hub/blog/op... · Posted by u/dtquad
gillesjacobs · 6 months ago
The naming might be somewhat politically coloured but post training with quality data is the best case for uncensoring models: abliteration usually causes substantial drop in performance.

Too bad the created dataset is not open source, as that would allow to verify the objectivity of answers to make sure it is not just a different flavour of propaganda.

That dataset is strategically useful for Perplexity as many more CCP-censored Chinese models are sure to be released.

gillesjacobs commented on If you believe in "Artificial Intelligence", take five minutes to ask it   svpow.com/2025/02/14/if-y... · Posted by u/lycopodiopsida
gillesjacobs · 6 months ago
Using 03-mini-high + Search I get the right answer he was looking for:

  The species was first split at the subgeneric level by Gregory S. Paul in 1988—he proposed the name Brachiosaurus (Giraffatitan) brancai. Then in 1991 George Olshevsky raised the subgenus Giraffatitan to full generic status, so that B. brancai became Giraffatitan brancai. Later, a 2009 study by Michael P. Taylor provided detailed evidence supporting this separation.
I guess Mike Taylor will gracefully cede his point now?

It is very funny to me that someone would feel the need to complain about a niche factual error in pretrained LLMs without even enabling RAG. If you even know the basics about this field, you shouldn't be surprised.

Of course this was probably more about ego stroking his paleontological achievement than a thoughtful evaluation of the current state of LLMs.

gillesjacobs commented on Show HN: FastGraphRAG – Better RAG using good old PageRank   github.com/circlemind-ai/... · Posted by u/liukidar
gillesjacobs · 9 months ago
Do you have any retrieval and generation metric scores (eg, KILT or NQ datasets)?

I know benchmark datasets are not the be-all-end-all, but a halfway decent score and inference-time, would really help sell your framework (or help engineers make the choice).

In any case, very cool work, I built a lot of RAG pipelines as freelance NLP engineer and I will try this out.

gillesjacobs commented on Show HN: FastGraphRAG – Better RAG using good old PageRank   github.com/circlemind-ai/... · Posted by u/liukidar
yaj54 · 9 months ago
> 3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.

I've been wondering about that and am glad to hear it's working in the wild.

I'm now wondering if using a fine-tuned LLM (on the corpus) to gen the hypothetical answers and then use those for the rag flow would work even better.

gillesjacobs · 9 months ago
The technique of generating hypothetical answers (or documents) from the query was first described in the "HyDE (Hypothetical Document Expansion) paper". [1]

Interestingly, going both ways: generate hypothetical answers for the query, and also generate hypothetical questions for the text chunk at ingestion both increase RAG performance in my experience.

Though LLM-based query-processing is not always suitable for chat applications if inference time is a concer (like near-real time customer support RAG), so ingestion-time hypothetical answer generation is more apt there.

1. https://aclanthology.org/2023.acl-long.99/

gillesjacobs commented on The Battle Line at Louvain   privatdozent.co/p/the-bat... · Posted by u/chmaynard
thiscatis · 9 months ago
The correct name of this city is Leuven.
gillesjacobs · 9 months ago
Tis Leive, sis

u/gillesjacobs

KarmaCake day1882January 29, 2018
About
[ my public key: https://keybase.io/gillesjacobs; my proof: https://keybase.io/gillesjacobs/sigs/naKVxhthsAZCj70Vsuk8ooWuF4gl1Ugl39KGf77zvrU ]
View Original