People interested in Aider (which is an awesome tool) might also be interested in checking out my project Plandex[1]. It's terminal-based like Aider and has a somewhat comparable set of features, but is more focused on using LLMs to work on larger and more complex tasks that span many files and model responses. It also uses a git-style CLI approach with independent commands for each action vs. Aider's interactive shell.
I studied Aider's code and prompts quite a bit in the early stages of building Plandex. I'm grateful to Paul for building it and making it open source.
I _cannot_ wait for you to get local models working with this (I know, they need function calling/streaming first). It's amazing! I burned through $10 like it was nothing and bigger context+local is going to make this killer IMHO. It needs additional guidance and with more context maybe loading lint rules into the context would get back code matching my coding style/guide but even as-is there is a ton of value here.
It was able to rewrite (partially, some didn't get fully done) 10 files before I hit my budget limits from Vue 2 Class Component syntax to Vue 3 Composition API. It would have needed another iteration or so to iron out the issues (plus some manual clean up/checking from me) but that's within spitting distance of being worth it. For now I'll use ChatGPT/Claude (which I pay for) to do this work but I will keep a close eye on this project, it's super cool!
Do you have any plans to build IDE plugins for this? I understand it's open source and anyone could add that, I was just wondering if that was even on the roadmap? Having this run in my IDE would just so awesome with diff tool I'm used to, with all the other plugins/hotkeys/etc I use.
Yes, VSCode and JetBrains plugins are on the roadmap. Here's the current roadmap by the way: https://github.com/plandex-ai/plandex#roadmap-%EF%B8%8F (it's not exhaustive, but can give you a sense of where I'd like to take Plandex in the future).
I haven't yet tried incorporating tree-sitter as Aider does to load in all definitions in the repo. In Plandex, the idea is more to load in just the files are relevant to what you're building before giving a prompt. You can also load directory layouts (with file names only) with `plandex load some-dir --tree`.
I like the idea of something like `plandex load some-dir --defs` to load definitions with tree-sitter. I don't think I'd load the whole repo's defs by default like Aider does (I believe?), because that could potentially use a lot of tokens in a large repo and include a lot of irrelevant definitions. One of Plandex's goals is to give the user granular control over what's in context.
But for now if you wanted to do something where definitions across the whole repo would be helpful (vs. loading in specific files or directories) then Aider is better at that.
This looked cool and I was excited to try it until I realized that I either need a subscription, or I need to set up a server. Why does this need a server, when Aider just works via the cli?
First I should note that while cloud will have a subscription eventually, it's free for now. There's an anonymous trial (with no email required) for up to 10 plans or 10 model responses, and then just name and email is required to continue.
I did start out with just the CLI running locally, but it reached a point where I needed a database and thus a client-server model. Plandex is designed for working on many 'plans' at different levels of the project hierarchy (some users on cloud have 50+ after using it for a week), and there's also a fair amount of concurrency, so it got to be too much for a local filesystem or even something like a local SQLite db.
Plandex also has the ability to send tasks to the background, which I think will start to play a more and more important role as models get better and more capable of running autonomously for longer periods, and I want to add sharing and collaboration features in the future as well, so all-in-all I thought a client-server model was the best base to build from.
I understand where you're coming from though. That local-only simplicity is definitely a nice aspect of Aider.
I apologize if I'm not posting this in the correct place, but I've been trying to test this out (looks like it'll be fantastic, btw) and I keep running into a 429 error that I've exceeded my current quota for chatgpt but I really don't think I have, which leads me to believe that maybe it's not really taking my api key when I run the export command. Is there a way to check or another reason I could be getting this error?
You'd be getting a different error if your key wasn't getting through at all. Can you double check that you're using the right OpenAI api key, have enough api credits (as distinct from chatgpt quota), and haven't hit any max spend limits? You can check here: https://platform.openai.com/account/api-keys
It does use line numbers, which definitely aren't infallible. That's why a `plandex changes` TUI is included to review changes before applying. Unfortunately no one has figured out a file update strategy yet that doesn't make occasional mistakes--probably we'll need either next-gen models or fine-tuning to get there.
That said, counting isn't necessarily required to use line numbers. If line numbers are included in the file when it's sent to the model, it becomes a text analysis task rather than a counting task. Here are the relevant prompts: https://github.com/plandex-ai/plandex/blob/main/app/server/m...
You could do this with Plandex (or Aider... or ChatGPT) by having it output a shell script then `chmod +x` it and run it. I experimented early on with doing script execution like this in Plandex, but decided to just focus on writing and updating files, as it seemed questionable whether execution could be made reliable enough to be worthwhile without significant model advances. That said, I'd like to revisit it eventually, and some more constrained tasks like copying and moving files around are likely doable without full-on shell script execution, though some scary failure cases are possible here if the model gets the paths wrong in a really bad way.
I'm interested in this and will probably set it up but I wish more AI tools were better integrated to my IDE. I know GH Copilot is and other big AI tools have plugins with chat/edit features but most of the cool open source doesn't seem to support IDEA/JetBrains.
I see the power of LLMs. I use GH Copilot, I use ChatGPT, but I crave deeper integration in my existing toolset. I need to force myself to try in-IDE Copilot Chat. My habit is to go to ChatGPT for anything of that nature and I'm not sure why that is. Sometimes it's the same way I break down my search to for things "I know I can find" then put together the results. In the same way I break down the problem into small pieces and have ChatGPT write them individually or somethings additively.
Folks have developed VSCode and NeoVim integrations for aider. They're based on forks of aider, so I'm not sure how carefully their authors are keeping them up to date with aider releases.
In my experience, Supermaven makes Copilot look like a joke, and they’ve just released a Jetbrains plugin. YMMV. It’s just code suggestions though, no chat box.
I've been using https://cursor.sh/ heavily for about 2 months and I'm pretty happy.
Cursor is a fork of VSCode focused on AI. I'd prefer to use something totally open-source, but Cursor is free, gets regular updates, and I can use my OpenAI API key.
The diff view works well with AI coding assistants. I end up parallelizing more. I let cursor do its thing while I'm already looking at the next file.
I love aider too! Have used it to automate things such as maintaining a translated version of the page in a git pre-commit hook.
I didn't get time to test it beyond installing it on VSCode today, but take a look at https://GitHub.com/continuedev/continue, Apache 2.0 license, and they have an IDEA/Jetbrains plugin. Plus codebase context, full configurability to use local or remote LLMs.
I probably need to give it another try but I tried that before with my own GPT-4 key, a local model, and their models and just got errors last time I tried it. I hope that was just a temp issue but because of that I moved on. Also I've tried Cody Pro (again, weird errors and when it did work I felt like Copilot would have done better).
What would be since is a single plugin that focuses only on UX and allows plug an play for AI models. I think we would benefit immensely from such a concept.
I have this 300 line Go application which manages git tags for me. I asked it to implement a -dry-run function. It failed twice. First time it just mangled the file. Second time it just made code that didn't do anything.
I asked it to rename a global variable. It broke the application and failed to understand scoping rules.
Perhaps it is bad luck, or perhaps my Go code is weird, but I don't understand how y'all wanna trust this.
It must be your app/lang/prompt/grandma/dog/... lol. LLMs are the future, and they will replaces Allllllll the coders in the woooorld (TM), and did you know "it" can create websites??? Wooo, let's go, baby!
Nah these things are all stupid as hell. Any back and forth between a human and an LLM in terms of problem solving coding tasks is an absolute disaster.
People here and certainly in the mainstream population see some knowledge and just naturally expect intelligence to go with it. But it doesn't. Wikipedia has knowledge. Books have knowledge. LLMs are just the latest iteration of how humans store knowledge. That's about it, everything else is a hyped up bubble. There's nothing in physics that stops us from creating an artificial, generally intelligent being, but it's NEVER going to be with auto-regressive next-token prediction.
> Nah these things are all stupid as hell. Any back and forth between a human and an LLM in terms of problem solving coding tasks is an absolute disaster.
I actually agree in the general case, but for specific applications these tools can be seriously awesome. Case in point - this repo of mine, which I think it's fair to say was 80% written by GPT-4 via Aider.
Now of course this is a very simple project, which is obviously going to have better results. And if you read through the commit history [1], you can see that I had to have a pretty good idea of what had to be done to get useful output from the LLM. There are places where I had to figure out something that the LLM was never going to get on its own, places where I made manual changes because directing the AI to do it would have been more trouble than it was worth, etc.
But to me, the cool thing about this project was that I just wouldn't have bothered to do it if I had to do all the work myself. Realistically I just wanted to download and process a list of like 15 urls, and I don't think the time invested in writing a scraper would have made sense for the level of time I would have saved if I had to figure it all out myself. But because I knew specifically what needed to happen, and was able to provide detailed requirements, I saved a ton of time and labor and wound up with something useful.
I've tried to use these sorts of tools for tasks in bigger and more complicated repos, and I agree that in those cases they really tend to swing and miss more often than not. But if you're smart enough to use it as the tool it is and recognize the limitations, LLM-aided dev can be seriously great.
Thanks for trying aider, and sorry to hear you had trouble working with it. It might be worth looking through some of the tips on the aider GitHub page [0].
In particular, this is one of the most important tips: Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk GPT through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.
Not sure if this was a factor in your attempts? I'd be happy to help you if you'd like to open an GitHub issue [1] our jump into our discord [2].
How is "Walk GPT through changes like you might with a junior dev" not a horrific waste of time?
Usually you do this with a human as an investment in their future performance, with the understanding that this is the least efficient way to get the job done in the short term.
Having to take a product that is already supposed to "grok code" and make a similar investment doesn't make any sense to me.
That’s why I’m not fully jumping in yet, as I think even GPT 4 is borderline. I’m grateful for those investing their energy into building things like this (and no doubt many will be successful) but I’m happy to remain an interested observer until the next generation when I think the value proposition may be much more evident.
Like another commenter I also use it for initial exploration in uncharted territory.
For coding only helpful with autocompleting error strings. Even then, it messes with normal auto-complete, might get rid of it.
I appreciate @anotherpaulg's continual benchmarking of LLM performance with aider, for example:
> OpenAI just released GPT-4 Turbo with Vision and it performs worse on aider’s benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the GPT-4 Turbo preview models.
I just tried it and it's amazingly cool, but the quality of the output just isn't there for me yet. It makes too much subtle errors to be as useful as the screenshots and the gifs makes it look
I agree with you. It's okay at really simple code changes for really simple repos, but it falls apart for anything outside the ordinary.
I'm sure I'll have to eat these words, but: This just doesn't feel like the right interface to me. LLMs are incredible at generating "inroads" to a problem, but terrible at execution. Worse yet at anticipating future problems.
All this might very well change. But until it does, I just want my LLMs to help me brainstorm and help me with syntax. I think there's a sweet spot somewhere between this tool and Copilot, but I'm not sure where.
Hi, for somewhere between GitHub Copilot and aider, you can try the desktop app 16x Prompt. I have been using it daily for the past few months and it suits my working style nicely.
It is capable of handling complex tasks like feature development and refactoring across multiple files, but it doesn't try to generate diff and apply them automatically.
Instead, you will get a response from LLM that is easy to read and allow you as a developer to quickly apply to your existing codebase.
I revisited Aider a couple of days ago, after going in circles with AutoGPT - which seemed to either forget or go lazy after a few prompts - to the point it refused to do something that it did a few prompts before. Then Aider delivered from the first prompt.
To be fair in a world of good LSP impls, grep/find are really primative tools to be using.
Not saying this isn't better then
a more sophisicated editor setup, just that grep and find are a _really_ low bar
Not sure if that's making things "fair". Grep & find are insanely powerful when you're a CLI power user.
Nonetheless, I'm particularly curious which cases the AI tool can find things that are not easy to find via find & grep (eg: finding URLs that are created via string concatenation, those that do not appear as a string literal in the source code)
Perhaps a larger question there, what's the overall false negative rate of a tool like this? Are there places where it is particularly good and/or particularly poor?
When we reach that world, let me know. I'm still tripping over a "python-lsp-server was simply not implemented async so sometimes when you combine it with emacs lsp-mode it eats 100% CPU and locks your console" issue.
Personally I don't like the fragility/IDE-specificity of a lot of LSP setups.
I wish every language just came with a good ctags solution that worked with all IDEs. When this is set up properly I rarely need more power than a shortcut to look up tags.
I studied Aider's code and prompts quite a bit in the early stages of building Plandex. I'm grateful to Paul for building it and making it open source.
1 - https://github.com/plandex-ai/plandex
It was able to rewrite (partially, some didn't get fully done) 10 files before I hit my budget limits from Vue 2 Class Component syntax to Vue 3 Composition API. It would have needed another iteration or so to iron out the issues (plus some manual clean up/checking from me) but that's within spitting distance of being worth it. For now I'll use ChatGPT/Claude (which I pay for) to do this work but I will keep a close eye on this project, it's super cool!
I hear you on the API costs. You should see my OpenAI bills from building Plandex :-/
Aider has a few blog posts speaking to it.
I like the idea of something like `plandex load some-dir --defs` to load definitions with tree-sitter. I don't think I'd load the whole repo's defs by default like Aider does (I believe?), because that could potentially use a lot of tokens in a large repo and include a lot of irrelevant definitions. One of Plandex's goals is to give the user granular control over what's in context.
But for now if you wanted to do something where definitions across the whole repo would be helpful (vs. loading in specific files or directories) then Aider is better at that.
I did start out with just the CLI running locally, but it reached a point where I needed a database and thus a client-server model. Plandex is designed for working on many 'plans' at different levels of the project hierarchy (some users on cloud have 50+ after using it for a week), and there's also a fair amount of concurrency, so it got to be too much for a local filesystem or even something like a local SQLite db.
Plandex also has the ability to send tasks to the background, which I think will start to play a more and more important role as models get better and more capable of running autonomously for longer periods, and I want to add sharing and collaboration features in the future as well, so all-in-all I thought a client-server model was the best base to build from.
I understand where you're coming from though. That local-only simplicity is definitely a nice aspect of Aider.
That said, counting isn't necessarily required to use line numbers. If line numbers are included in the file when it's sent to the model, it becomes a text analysis task rather than a counting task. Here are the relevant prompts: https://github.com/plandex-ai/plandex/blob/main/app/server/m...
OpenInterpreter is another project you could check out that is more focused on code/script execution: https://github.com/OpenInterpreter/open-interpreter
https://github.com/OpenInterpreter/open-interpreter
I see the power of LLMs. I use GH Copilot, I use ChatGPT, but I crave deeper integration in my existing toolset. I need to force myself to try in-IDE Copilot Chat. My habit is to go to ChatGPT for anything of that nature and I'm not sure why that is. Sometimes it's the same way I break down my search to for things "I know I can find" then put together the results. In the same way I break down the problem into small pieces and have ChatGPT write them individually or somethings additively.
The aider install instructions has more info:
https://aider.chat/docs/install.html#add-aider-to-your-edito...
I actually like completions more, it feels more natural. I’m fine to go to ChatGPT/Opus to chat if needed.
Cursor is a fork of VSCode focused on AI. I'd prefer to use something totally open-source, but Cursor is free, gets regular updates, and I can use my OpenAI API key.
The diff view works well with AI coding assistants. I end up parallelizing more. I let cursor do its thing while I'm already looking at the next file.
I love aider too! Have used it to automate things such as maintaining a translated version of the page in a git pre-commit hook.
I asked it to rename a global variable. It broke the application and failed to understand scoping rules.
Perhaps it is bad luck, or perhaps my Go code is weird, but I don't understand how y'all wanna trust this.
Nah these things are all stupid as hell. Any back and forth between a human and an LLM in terms of problem solving coding tasks is an absolute disaster.
People here and certainly in the mainstream population see some knowledge and just naturally expect intelligence to go with it. But it doesn't. Wikipedia has knowledge. Books have knowledge. LLMs are just the latest iteration of how humans store knowledge. That's about it, everything else is a hyped up bubble. There's nothing in physics that stops us from creating an artificial, generally intelligent being, but it's NEVER going to be with auto-regressive next-token prediction.
I actually agree in the general case, but for specific applications these tools can be seriously awesome. Case in point - this repo of mine, which I think it's fair to say was 80% written by GPT-4 via Aider.
https://github.com/epiccoleman/scrapio
Now of course this is a very simple project, which is obviously going to have better results. And if you read through the commit history [1], you can see that I had to have a pretty good idea of what had to be done to get useful output from the LLM. There are places where I had to figure out something that the LLM was never going to get on its own, places where I made manual changes because directing the AI to do it would have been more trouble than it was worth, etc.
But to me, the cool thing about this project was that I just wouldn't have bothered to do it if I had to do all the work myself. Realistically I just wanted to download and process a list of like 15 urls, and I don't think the time invested in writing a scraper would have made sense for the level of time I would have saved if I had to figure it all out myself. But because I knew specifically what needed to happen, and was able to provide detailed requirements, I saved a ton of time and labor and wound up with something useful.
I've tried to use these sorts of tools for tasks in bigger and more complicated repos, and I agree that in those cases they really tend to swing and miss more often than not. But if you're smart enough to use it as the tool it is and recognize the limitations, LLM-aided dev can be seriously great.
[1]: https://github.com/epiccoleman/scrapio/commits/master/?befor...
Language is a tool to convey information. LLMs are only about the language, not the information.
In particular, this is one of the most important tips: Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk GPT through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.
Not sure if this was a factor in your attempts? I'd be happy to help you if you'd like to open an GitHub issue [1] our jump into our discord [2].
[0] https://github.com/paul-gauthier/aider#tips
[1] https://github.com/paul-gauthier/aider/issues/new/choose
[2] https://discord.gg/Tv2uQnR88V
Usually you do this with a human as an investment in their future performance, with the understanding that this is the least efficient way to get the job done in the short term.
Having to take a product that is already supposed to "grok code" and make a similar investment doesn't make any sense to me.
> OpenAI just released GPT-4 Turbo with Vision and it performs worse on aider’s benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the GPT-4 Turbo preview models.
https://aider.chat/2024/04/09/gpt-4-turbo.html
I'm sure I'll have to eat these words, but: This just doesn't feel like the right interface to me. LLMs are incredible at generating "inroads" to a problem, but terrible at execution. Worse yet at anticipating future problems.
All this might very well change. But until it does, I just want my LLMs to help me brainstorm and help me with syntax. I think there's a sweet spot somewhere between this tool and Copilot, but I'm not sure where.
It is capable of handling complex tasks like feature development and refactoring across multiple files, but it doesn't try to generate diff and apply them automatically.
Instead, you will get a response from LLM that is easy to read and allow you as a developer to quickly apply to your existing codebase.
You can check it out here: https://prompt.16x.engineer/
PS. I've gathered a list of LLM agents (for coding and general purpose) https://docs.google.com/spreadsheets/d/1M3cQmuwhpJ4X0jOw5XWT...
Nonetheless, I'm particularly curious which cases the AI tool can find things that are not easy to find via find & grep (eg: finding URLs that are created via string concatenation, those that do not appear as a string literal in the source code)
Perhaps a larger question there, what's the overall false negative rate of a tool like this? Are there places where it is particularly good and/or particularly poor?
edits: brevity & clarity
I wish every language just came with a good ctags solution that worked with all IDEs. When this is set up properly I rarely need more power than a shortcut to look up tags.
https://github.com/OpenDevin/OpenDevin/issues/120