Readit News logoReadit News
Posted by u/NadavBenItzhak a month ago
Show HN: ChunkHound, a local-first tool for understanding large codebasesgithub.com/chunkhound/chu...
ChunkHound’s goal is simple: local-first codebase intelligence that helps you pull deep, core-dev-level insights on demand, generate always-up-to-date docs, and scale from small repos to enterprise monorepos — while staying free + open source and provider-agnostic (VoyageAI / OpenAI / Qwen3, Anthropic / OpenAI / Gemini / Grok, and more).

I’d love your feedback — and if you have, thank you for being part of the journey!

goda90 · a month ago
A few years ago I set out to refactor some of my team's code that I wasn't particularly familiar with, but we wanted to modularize and re-use in more places. The primary file alone was 18k+ lines of Typescript that was a terrible mess of spaghetti. Most of it had been written in JavaScript but later converted haphazardly. I ended up writing myself a little app that used the Typescript compiler APIs to help me just explore all the many branches of the code and annotate how I would refactor different parts. It helped a bit, but I never got time to add some of the more intelligent features I wanted like finding every execution path between two points.
henryhale · a month ago
give depgraph a try - https://github.com/henryhale/depgraph - i'd like to learn about how i could improve it.
flowerbreeze · a month ago
I gave it a try on my current codebase out of curiosity. Definitely useful. It worked well and fast, but it has a lot of duplicates that get rendered as exports in the NodeJS modules based codebase. I think it can sometimes be caused by me just being haphazard about re-exporting them, but other times I'm not sure.

Eg authenticatedMenu() appears 4 times in authenticatedMenu.js, only one of them is imported by 2 different files and 3 are just there alone. There's a single export in the file and a number of other files import it through an index.js that re-exports several files other files too.

In my case I think it'd help, if I could disable the duplicates as they don't really provide any useful information when exploring the codebase.

Also, if there was optionally a way to ignore the files that re-export functions/classes and collapse those paths, it'd make the graph a lot smaller and more easy to understand. Maybe it's already something that depgraph does, but the duplicates confuse things, so I'm not sure.

dcreater · a month ago
you say "local-first" but have placed voyage API for embeddings as the default (had to go to the website and dig to find that you can infact use local embedding models). Please fix
ofriw · a month ago
Thank you, yes the docs are overdue for a refresh. It's in the works
wiml · a month ago
Presumably it could update its own docs
esafak · a month ago
It would be convenient if it could load local SLMs itself, otherwise I'll have to manually start the LLM server before I can use it, and it's not something I leave running all the time.
henryhale · a month ago
I have been working on depgraph (https://github.com/henryhale/depgraph) for a while now. It is truly local with several output options(json, mermaid, jsoncanvas). Mutliple languages are supported (js, go, c) - expanding the list slowly but sure.
romperstomper · a month ago
I don't understand how/why all of this is local-first if all these providers are supported and used - could you elaborate what is sent to them?
ofriw · a month ago
The DB is stored locally, and any embedding, reranker and LLM will work. It's up to you if you self host these or bring them externally from one SaaS or the other
Neywiny · a month ago
Might give this a try to experiment if it's really free to use (I'll have to read up on that I guess). The qemu codebase is huge and every contributer seems to solve problems in slightly different ways. Would be nice if this tool could help distill it.
ofriw · a month ago
Completely free, MIT licensed. You can fully self host it if you have the hardware to run Qwen3-embedding and reranker models
dogman123 · a month ago
Is there a way to have the model inside of codex to make use of chunkhound instead of its “built in” search/explore functionality with rg? Whenever I spin up a new agent using xhigh thinking it spins its wheels for a while to get up to speed — wondering if chunkhound can make this process faster.
esafak · a month ago
That's what the MCP is for, if you can get the LLM to use it. Sometimes they just like to do it their own way :)
strainer_spoon · 15 days ago
The ChunkHound docs are a bit confusing for making it available as an MCP server for Codex. How exactly do you do it? I got up to the indexing step and now need to let Codex be able to use it.
conception · a month ago
I have chunckhound is a few projects and it’s noted in both the agent md file as well as mcp and claude never uses it. Ever. Never once.

Is there a prompt special sauce y’all use to get it to use it?

ofriw · a month ago
Just add to your prompt something like "use code research", but yes there's a PR in the works that fixes that and optimizes the MCP tools interface - https://github.com/chunkhound/chunkhound/pull/150
potamic · 24 days ago
I followed the docs for ollama configuration, but it says unknown LLM provider when I try running the research command.