Similar to you I saw a lot of bloated projects out there. Mine is 90mb container.
I want to do what your project does but in addition have extensions for every day apps that index into a db.
Your private database for all ai interactions.
I also have a cloud version using the mcp auth spec, but it’s all for fun and probably not worth releasing.
Do you have any plans to do further use cases such as this?
What's the story for chunking PDFs?
We've been using Marker and handling markdown->chunks manually.
When a user works with our agent, they may end up with a large conversation thread (e.g. 200k+ tokens) with many SQL snippets, query results and database metadata (e.g. table and column info).
For example, if they ask "show me any companies that were heavily engaged at one point, but I haven't talked to in the last 90 days". This will pull in their schema (e.g. Hubspot), run a bunch of SQL, show them results, etc.
I want to allow the agent to search previous threads for answers so they don't need to have the conversation again, but chunking up the existing thread is non-trivial (e.g. you don't want to separate the question and answer, you may want to remove errors while retaining the correction, etc.).
Do you have any plans to support "auto chunking" for AI message[0] threads?
0 - e.g. https://platform.openai.com/docs/api-reference/messages/crea...
Double clicking on this, are these messages you’d want to drop from memory because they’re not part of the actual content (e.g. execution errors or warnings)? That kind of cleanup is something Chonkie can help with as a pre-processing step.
If you can share an example structure of your message threads, I can give more specific guidance. We've seen folks use Chonkie to chunk and embed AI chat threads — treating the resulting vector store as long-term memory. That way, you can RAG over past threads to recover context without redoing the conversation.
P.S. If HN isn’t ideal for going back and forth, feel free to send me an email at shreyash@chonkie.ai.
Chunking fundamentals remain the same whether you're doing traditional semantic search or agentic retrieval. The key difference lies in the retrieval strategy, not the chunking approach itself.
For quality agentic retrieval, you still need to create a knowledge base by chunking documents, generating embeddings, and storing them in a vector database. You can add organizational structure here—like creating separate collections for different document categories (Physics papers, Biology papers, etc.)—though the importance of this organization depends on the size and diversity of your source data.
The agent then operates exactly as you described: it queries the vector database, retrieves relevant chunks, and synthesizes them into a coherent response. The chunking strategy should still optimize for semantic coherence and appropriate context window usage.
Regarding your concern about large DB records: you're absolutely right. Even individual research papers often exceed context windows, so you'd still need to chunk them into smaller, semantically meaningful pieces (perhaps by section, abstract, methodology, etc.). The agent can then retrieve and combine multiple chunks from the same paper or across papers as needed.
The main advantage of agentic retrieval is that the agent can make multiple queries, refine its search strategy, and iteratively build context—but it still relies on well-chunked, embedded content in the underlying vector database.
It looks like size and speed is your major advantage. In our RAG pipeline we run the chunking process async as an onboarding type process. Is Chonkie primarily for people looking to process documents in some sort of real-time scenario?
Typically, our current users fall into one of two categories:
- People who are running async chunking but need access to a strategy not supported in langchain/llamaIndex. Sometimes speed matters here too, especially if the user has a high volume of documents
- people who need real time chunking. Super useful for apps like codegen/code review tools.