Show HN: FastGraphRAG – Better RAG using good old PageRank

from fast_graphrag import GraphRAG DOMAIN = "Analyze this story and identify the characters. Focus on how they interact with each other, the locations they explore, and their relationships." EXAMPLE_QUERIES = [ "What is the significance of Christmas Eve in A Christmas Carol?", "How does the setting of Victorian London contribute to the story's themes?", "Describe the chain of events that leads to Scrooge's transformation.", "How does Dickens use the different spirits (Past, Present, and Future) to guide Scrooge?", "Why does Dickens choose to divide the story into \"staves\" rather than chapters?" ] ENTITY_TYPES = ["Character", "Animal", "Place", "Object", "Activity", "Event"] grag = GraphRAG( working_dir="./book_example", domain=DOMAIN, example_queries="\n".join(EXAMPLE_QUERIES), entity_types=ENTITY_TYPES ) with open("./book.txt") as f: grag.insert(f.read()) print(grag.query("Who is Scrooge?").response)

PageRank is one of several interesting centrality metrics that could be applied to a graph to influence RAG on structural data, another one is Triangle Centrality which counts triangles around nodes to figure out their centrality based on the concept that triangles close relationships into a strong bond, where open bonds dilute centrality by drawing weight away from the center:

https://arxiv.org/abs/2105.00110

The paper shows high efficiency compared to other centralities like PageRank, however in some research using the GraphBLAS I and my coauthors found that TC was slower on a variety of sparse graphs than our sparse formulation of PR for graphs up to 1.8 billion edges, but that TC appears to scale better as graphs get larger and is likely more efficient in the trillion edge realm.

https://fossies.org/linux/SuiteSparse/GraphBLAS/Doc/The_Grap...

liukidar · a year ago

This is super interesting! Thanks for sharing. Here we are talking of graphs in the milions nodes/edges, so efficiency is not that big of a deal, since anyway things are gonna be parsed by a LLM to craft an asnwer which will always be the bottleneck. Indeed PageRank is the first step, but we would be happy to test more accurate alternatives. Importantly, we are using personalized pagerank here, meaning we give specific intial weights to a set (potentially quite large) of nodes, would TC support that (as well as giving weight to edges, since we are also looking into that)?

michelpp · a year ago

> Here we are talking of graphs in the milions nodes/edges,

That ought to be enough for anybody.

> would TC support that

TC is a purely structural algorithm, it counts triangles so it doesn't take any weights into consideration, but it does return a vector of normalized ranking from 0.0 to 1.0, which you could combine with an existing biasing strategy to boost results that have strong centrality.

arkokoley · a year ago

Have you tried Authority Rank as a substitute for PageRank? https://link.springer.com/content/pdf/10.1007/978-3-030-6097...

So I've done a ton of work in this area.

Few learnings I've collected:

1. Lexical search with BM25 alone gives you very relevant results if you can do some work during ingestion time with an LLM.

2. Embeddings work well only when the size of the query is roughly on the same order of what you're actually storing in the embedding store.

3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.

So combining all 3 learnings, we landed on a knowledge decomposition and extraction step very similar to yours. But we stick a metaprompter to essentially auto-generate the domain / entity types.

LLMs are naively bad at identifying the correct level of granularity for the decomposed knowledge. One trick we found is to ask the LLM to output a mermaid.js mindmap to hierarchically break down the input into a tree. At the end of that output, ask the LLM to state which level is the appropriate root for a knowledge node.

Then the node is used to generate questions that could be answered from the knowledge contained in this node. We then index the text of these questions and also embed them.

You can directly match the user's query from these questions using purely BM25 and get good outputs. But a hybrid approach works even better, though not by that much.

Not using LLMs are query time also means we can hierarchically walk down the root into deeper and deeper nodes, using the embedding similiarity as a cost function for the traversal.

isoprophlex · a year ago

> LLMs are naively bad at identifying the correct level of granularity for the decomposed knowledge. One trick we found is to ask the LLM to output a mermaid.js mindmap to hierarchically break down the input into a tree. At the end of that output, ask the LLM to state which level is the appropriate root for a knowledge node. > Then the node is used to generate questions that could be answered from the knowledge contained in this node. We then index the text of these questions and also embed them.

Ha, that's brilliant. Thanks for sharing this!

antves · a year ago

Thanks for sharing this! It sounds very interesting. We experimented with a similar tree setup some time ago and it was giving good results. We eventually decided to move towards graphs as a general case of trees. I think the notion of using embeddings similarity for "walking" the graph is key, and we're actively integrating it in FastGraphRAG too by weighting the edges by the query. It's very nice to see so many solutions landing on similar designs!

siquick · a year ago

> 1. Lexical search with BM25 alone gives you very relevant results if you can do some work during ingestion time with an LLM

Can you expand on what the LLM work here is and it’s purpose?

> 3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.

Interesting idea, going to add to our experiments. Thanks.

andai · a year ago

It seems to come down to keyword expansion, though I'd be curious if there's more to it than just asking "please generate relevant keywords".

yaj54 · a year ago

> 3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.

I've been wondering about that and am glad to hear it's working in the wild.

I'm now wondering if using a fine-tuned LLM (on the corpus) to gen the hypothetical answers and then use those for the rag flow would work even better.

gillesjacobs · a year ago

The technique of generating hypothetical answers (or documents) from the query was first described in the "HyDE (Hypothetical Document Expansion) paper". [1]

Interestingly, going both ways: generate hypothetical answers for the query, and also generate hypothetical questions for the text chunk at ingestion both increase RAG performance in my experience.

Though LLM-based query-processing is not always suitable for chat applications if inference time is a concer (like near-real time customer support RAG), so ingestion-time hypothetical answer generation is more apt there.

1. https://aclanthology.org/2023.acl-long.99/

tweezy · a year ago

We do this as well with a lot of success. It’s cool to see others kinda independently coalescing around this solution.

What we find really effective is at content ingestion time, we prepend “decorator text” to the document or chunk. This incorporates various metadata about the document (title, author(s), publication date, etc).

Then at query time, we generate a contextual hypothetical document that matches the format of the decorator text.

We add hybrid search (BM25 and rerank) to that, also add filters (documents published between these dates, by this author, this type of content, etc). We have an LLM parameterize those filters and use them as part of our retrieval step.

This process works incredibly for end users.

oedemis · a year ago

but what about the chunk size, if we have a small chunks like 1 sentence and the hyde embeddings are most of the time larger, the results are not so good

sramam · a year ago

Very interesting. Thank you getting into the details. Do you chunk the text that goes into the BM25 index? For the hypothetical answer, do you also prompt for "chunk size" responses?

itissid · a year ago

Very cool and relatable I faced a similar issue for my content categorization engine for local events: http://drophere.co/presence/where (code: https://github.com/itissid/drop_webdemo). Finding the right category for a local event is difficult, an event could be "Outdoorsy" but also "Family Fun" and "Urban Exploration".

Initially I generated categories by asking an LLM with a long prompt(https://github.com/itissid/Drop-PoT/blob/main/src/drop_backe...) But I like your idea better!

My next iteration to solve this problem – I never got to it – was gonna be to generate the most appropriate categories based on user's personal interest, weather, time of day and non PII data and fine-tune a retrieval and a ranking engine to generate categories for each content piece personalized to them.

katelatte · a year ago

I organize community calls for Memgraph community and recently a community member presented how he uses hypothetical answer generation as a crucial component to enhancing the effectiveness and reliability of the system, allowing for more accurate and contextually appropriate responses to user queries. Here's more about it: https://memgraph.com/blog/precina-health-memgraph-graphrag-t...

mhuffman · a year ago

My experience matches your's, but related to

>3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.

What sort of performance are you getting in production with this one? The other two are basically solved for performance and RAG in general if it is related to a known and pre-processed corpus but I am having trouble thinking of how you don't get a hit with #3.

LASR · a year ago

It's slow. So we use hypothetical mostly for async experiences.

For live experiences like chat, we solved it with UX. As soon as you start typing the words of a question into the chat box, it does the FTS search and retrieves a set of documents that have word-matches, scored just using ES heuristics (eg: counting matching words etc)

These are presented as cards that expand when clicked. The user can see it's doing something.

While that's happening, also issue a full hyde flow in the background with a placeholder loading shimmer that loads in the full answer.

So there is some dead-time of about 10 seconds or so while it generates the hypothetical answers. After that, a short ~1 sec interval to load up the knowledge nodes, and then it starts streaming the answer.

This approach tested well with UXR participants and maintains acceptable accuracy.

A lot of the times, when looking for specific facts from a knowledge base, just the card UX gets an answer immediately. Eg: "What's the email for product support?"

sdesol · a year ago

> Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.

This is honestly wear I think LLM really shines. This also gives you a very good idea if your documentation is deficient or not.

liukidar · a year ago

Thanks for sharing! These are all very helpful insights! We'll keep this in mind :)

Deleted Comment