Readit News logoReadit News
jeffchuber · 7 months ago
This is still retrieval and RAG, just not vector search and indexing. it’s incredibly important to be clear about terms - and this article does not meet the mark.
nick-baumann · 7 months ago
Fair point Jeff -- you're right that we're still doing retrieval. The key distinction is how we retrieve.

Traditional RAG for code uses vector embeddings and similarity search. We use filesystem traversal and AST parsing - following imports, tracing dependencies, reading files in logical order. It's retrieval guided by code structure rather than semantic similarity.

I highly recommend checking out what the Claude Code team discovered (48:00 https://youtu.be/zDmW5hJPsvQ?si=wdGyiBGqmo4YHjrn&t=2880). They initially experimented with RAG using embeddings but found that giving the agent filesystem tools to explore code naturally delivered significantly better results.

From our experience, vector similarity often retrieves fragments that mention the right keywords but miss the actual implementation logic. Following code structure retrieves the files a developer would actually need to understand the problem.

So yes -- I should have been clearer about the terminology. It's not "no retrieval" -- it's structured retrieval vs similarity-based retrieval. And with today's frontier models having massive context windows and sophisticated reasoning capabilities, they're perfectly designed to build understanding by exploring code the way developers do, rather than needing pre-digested embeddings.

phillipcarter · 7 months ago
Probably good to add a disclaimer at the top that clarifies the definition, since RAG is ultimately just a pattern, and vector indexes are just one way to implement the pattern.

Indeed, industry at large sees RAG as equivalent to "vector indexes and cosine similarity w.r.t. input query", and the rest of the article explains thoroughly why that's not the right approach.

kohlerm · 7 months ago
Following dependencies is the way to go IMHO. Saying "Code Doesn't Think in Chunks", is IMHO not correct. Developers do thing in chunks of codes. E.g. this function calls that function uses that type and is used here and there. It is not really a file based model like Cline uses. The file based model is "just" simpler to implement :-) . We use a more sophisticated code chunking approach in https://help.sap.com/docs/build_code/d0d8f5bfc3d640478854e6f... Let's see, maybe we should open source that ...
aryamaan · 7 months ago
Hi, nick, given that this product is opensourced, I have a request/ wish:

It would be wondeful if some of the tools the projects uses are exposed to build on. Like the tools related to AST, finding definitions, and many more

dcreater · 7 months ago
If you're putting everything in the context window, is it still considered "retrieval"? Did we have a preexisting robust definition of what constitutes retrieval?
Tycho · 7 months ago
Don’t take this the wrong way, but did you use an LLM to generate this reply? The reply is good, but the writing style just piqued my curiosity.
swyx · 7 months ago
btw that podcast was recorded in jeff's office :)
WhitneyLand · 7 months ago
The article reads fine to me.

Yes by technicality RAG could mean any retrieval, but in practice when people use the term it’s almost always referring to some sort of vector embedding and similarity searching.

jamesblonde · 7 months ago
Wittginstein would concur
paxys · 7 months ago
> it’s incredibly important to be clear about terms

Is it? None of these terms even existed a couple of years ago, and their meaning is changing day by day.

colordrops · 7 months ago
I guess that's technically true, but RAG has colloquially taken on the meaning of vector database retrieval. Perhaps there's a paper out there that defines RAG specifically as any data retrieval, but at this point in time that's so general a term that it's bordering on useless. It's like saying "network connected application". No one has said that for decades now that it's status quo. Also, there are many types of networks, but "network connected app" generally meant TCP, despite it not being in the name.

Deleted Comment

throwaway314155 · 7 months ago
Fairly pedantic take.
cdelsolar · 7 months ago
Cline is the most impressive agentic coder tool I’ve used and it seems to be getting better. I’ve learned to work with it to the extent where I can plan with it for 10-15 minutes, set it loose on my codebase, go get lunch, and then its diff is almost always completely on the money. You should commit often for those rare cases where it goes off the rails (which seems to happen less frequently now).

Using Gemini 2.5 pro is also pretty cheap, I think they figured out prompt caching because it definitely was not cheap when it came out.

atonse · 7 months ago
I've always wondered... Making agents edits (like vibe coding), all the tools I've tried (Cursor, Zed, VSCode) are pretty equal since most of the brains are in the underlying models themselves.

But the killer app that keeps me using Cursor is Cursor Tab, which helps you WHILE you code.

Whatever model they have for that works beautifully for me, whereas Zed's autocomplete model is the last thing that keeps me away from it.

What do Cline users use for the inline autocomplete model?

nlh · 7 months ago
I use Cline within Cursor — best of both worlds!
thebigspacefuck · 7 months ago
Copilot Next Edit
bradfox2 · 7 months ago
Copilot
silverlake · 7 months ago
Have you guys at Cline considered using LLMs to create summaries of files and complex functions? Rather than read a 500 line function, feed it a short comment on what the function is doing. I'd like to use a local LLM to create summaries at every level: function, file, directory. Then let the LLM use that to find the right code to read. This is basically how I navigate a large code base.
dpe82 · 7 months ago
I've just used Cline to produce files like that, and then later when starting a task in plan mode I tell it to read those files to get a sense of the project's structure. I also tell it to update them as necessary after whatever task we're doing is finished.
noman-land · 7 months ago
So you're effectively keeping two copies of the codebase but the second copy is written in prose?
zentara_code · 7 months ago
Would it double your codebase? Do you think it would work for a large codebase?
WhitneyLand · 7 months ago
I generally agree with the article and the approach given practical constraints, however it’s all stop gap anyway.

Using Gemini 2.5’s 1MM token context window to work with large systems of code at once immediately feels far superior to any other approach. It allows using an LLM for things that are not possible otherwise.

Of course it’s damn expensive and so hard to do in a high quality way it’s rare luxury, for now…

orbital-decay · 7 months ago
It's always a tradeoff, and most of the time chunking and keeping the context short performs better.

I feed long context tasks to each new model and snapshot just to test the performance improvements, and every time it's immediately obvious that no current model can handle its own max context. I do not believe any benchmarks, because contrary to the results of many of them, no matter what the (coding) task is, the results start getting worse after just a couple dozen thousand tokens, and after a hundred the accuracy becomes unacceptable. Lost-in-the-middle is still a big issue as well, at least for reasoning if not for direct recall - despite benchmarks showing it's not. LLMs are still pretty unreliable at one-shotting big things, and everything around it is still alchemy.

loandbehold · 7 months ago
1 million tokens is still not enough for real life codebases (100Ks to millions loc)
simonklee · 7 months ago
And it's obviously expensive use this approach.
ramoz · 7 months ago
I kept wondering why Cursor was indexing my codebase, it was never clear.

Anyway context to me enables a lot more assurance and guarantees. RAG never did.

My favorite workflow right now is:

  - Create context with https://github.com/backnotprop/prompt-tower
  - Feed it to Gemini
  - Gemini Plans
  - I pass the plan into my local PM framework
  - Claude Code picks it up and executes
  - repeat

anukin · 7 months ago
How does this work?

It’s not clear how context is used to plan by Gemini then the plan is fed to local framework. Do I have to replan every time context changes?

woah · 7 months ago
They use a tool like the one they linked to put all their files into one file that they give to gemini.

Then they put the plan into their "PM framework" (some markdown files?) to have Claude Code pick tasks out of.

greymalik · 7 months ago
Can you give an example of a local PM framework? What happens in this step - ticket creation?
mindcrash · 7 months ago
Warning: I don't know what this post does in the background but it definitely slows down Firefox 138 to the point that it is barely usable
sbarre · 7 months ago
Yeah same here, the whole site is just unresponsive... So it's not just you.
lsaferite · 7 months ago
I almost had to force-quit FF as well.
crop_rotation · 7 months ago
After trying cline,aider,codex, and what not, I feel claude code is just so so better than all of them. e.g It takes much much fewer prompts to be able to do the same thing compared to cline. tbh I am not sure how cline will compete against something like Claude code due to the resource/capability imbalance. Does anyone else have a different experience?
XenophileJKO · 7 months ago
I really felt like Claude Code would benefit greatly from a similar structural map. The little map it made in the Claude.md is insufficient. When the code base grows or you add a more componentized approach, Claude Code started favoring a locality bias which increases the architectural entropy a LOT.
loandbehold · 7 months ago
Same experience. Claude Code is much better than all other tools. I suspect Claude Code uses some private features of Claude model that's not available to other tools. It only makes sense that Anthropic will develop their model in conjunction with the tool to produce best result.
behnamoh · 7 months ago
I never had good experience with RAG anyway, and it felt "hacky". Not to mention most of it basically died when most models started supporting +1M context.

LLMs are already stochastic. I don't want yet another layer of randomness on top.

heyhuy · 7 months ago
> and it felt "hacky"

I think the pattern that coined "RAG" is outdated, that pattern being relying on cosine similarities against an index. It was a stop gap for the 4K token window era. For AI copilots, I love Claude Code, Cline's approach of just following imports and dependencies naturally. Land on a file and let it traverse.

No more crossing your fingers with match consign and hoping your reranker did drop a critical piece.

WaltPurvis · 7 months ago
>Not to mention most of it basically died when most models started supporting +1M context.

Do most models support that much context? I don't think anything close to "most" models support 1M+ context. I'm only aware of Gemini, but I'd love to learn about others.

fkyoureadthedoc · 7 months ago
GPT 4.1 / mini / nano