Large language models and small language models are very strong for solving problems, when the problem is narrow enough.
They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.
An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.
The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.
Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.
I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.
The hard part is likely when someone proves some “fact” which the models knows and has had reinforced by this training is no longer true. The model will take time to “come around” to understand this new situation. But this isn’t unlike the general populous. At scale humans accept new things slowly.
An interesting question is, if pre-trained specialized models are available for a thousand or ten thousand most common tasks humans do every day, of what use a general model could be?
Go around, and ask people if they want a free secretary, see if someone doesn't want one.
They will not ask a computer program to do stuff. They will talk to the secretary, and the secretary will do any computer stuff necessary. For the moment the secretary cannot click buttons and menus as well as manipulate language. The secretary will have realistic skin and lips, and a dress as short as required by the job description.
Language AI is the king of AIs, and the gap will only get bigger. Everything will be tied to language going forward, driving, surgeries and so on.
The program is gonna do, what I am currently doing by hand, opening files and copying function/method signatures usually, from files all over the place.
The key here is to fetch into the context window only what is needed for one-question/one-answer and no more, hence the Context Minimization. Context fetched is gonna be specified by the programmer, for example sig(function) fetches only the signature, while @function captures the whole body. sig(Struct) is gonna fetch the fields and signatures of all of it's methods. sig(Trait) similarly.
In my view, giving the A.I. more information than needed, only confuses it and accuracy degrades. It is also slower and more expensive but that's a side effect.
The project is in early stages, for the moment it calls ast-grep under the hood, but eventually, if it works as it is supposed to, I plan to move to tree-sitter queries.
If there is a similar project somewhere I would appreciate a pointer to it, but I am not interested in implementations of agents. My program does not give the A.I. a whole view of the codebase, only the necessary points specified by the programmer.
I think anyone technically savvy enough to follow the article is already aware Linux is a viable primary OS, the question is can you manage it without having to become a Linux nerd? I want to be able to tell normal people they can use Linux.
Nowadays every time I want to run a non-trivial command of a program, configure a file somewhere, customize using code Emacs or anything else, I always put the LLMs to do it. I do almost nothing by myself, except check if said file is indeed there, open the file and copy paste the new configuration, restart the program, copy paste code here and there and so on.
No need to be a nerd to use Linux, that's so 2021. LLMs are the ultimate nerds when it comes to digging into manuals, scour the internet and github for workarounds, or tips and tricks and so on.
If I could get away with carrying a tiny device again instead of lugging around a brick I would, but the world has made it as inconvenient as possible not to.
A BlackBerry from 15 years ago weighed just over 100g and did 80% of what your modern-day pocket computer can.
Then they might move somewhere else with different banks and different hardware requirements, they will carry 5 phones.
All difficult problems are solved, by solving simple problems first and combining the simple solutions to solve more difficult problems etc etc.
Claude can do that, but you seriously overestimate it's capabilities by a factor of a thousand or a million.
Code that works but it is buggy, is not what Linux is.
I stumbled upon it in late 2023 when investigating ways to give OpenHands [2] better context dynamically.
[0] https://aider.chat/
[1] https://aider.chat/2023/10/22/repomap.html
[2] https://openhands.dev/
The unfortunate thing for Python that the repomap mentions, and untyped/duck-typed languages, is that function signatures do not mean a lot.
When it comes to Rust, it's a totally different story, function and method signatures convey a lot of important information. As a general rule, in every LLM query I include maximum one function/method implementation and everything else is function/method signatures.
By not giving mindlessly LLMs whole files and implementations, I have never used more than 200.000 tokens/day, counting input and output. This counts as 30 queries for a whole day of programming, and costs less than a dollar per day not matter which model I use.
Anyway, putting the agent to build the repomap doesn't sound such a great idea. Agents are horribly inefficient. It is better to build the repomap deterministically using something like ast-grep, and then let the agent read the resulting repomap.