Micro-agent: make an AI write code until it passes an unit test

Enterprise developer from hell (https://fsharpforfunandprofit.com/posts/property-based-testi...) but as a CLI tool

mrdevlar · a year ago

Neat, thanks for this, I enjoyed this viewpoint as it might help systems like this actually be able to build something reasonable.

BiteCode_dev · a year ago

Indeed, and it will force you to write good tests :)

joseferben · a year ago

i found that the feedback loop between llm and test suite works really well, especially with sonnet 3.5

i wrote a similar tool the other day: https://github.com/joseferben/makeitpass

it can make all kinds of commands pass by checking stdout/stderr and it’s language agnostic (you need npx to run makeitpass)

shakna · a year ago

Most LLMs struggle to even do a null check. Could this check for those kinds of glaring security holes?

jejeyyy77 · a year ago

add a test for it?

shakna · a year ago

Kinda hard to right a test that a value that is null-checked, when that value may never actually be returned.

For example, have a C function that reads in a file and returns you a string? The string can be checked that malloc actually succeeded, but how do you check that the file actually opened?

bangaladore · a year ago

At least in my experience, possibly due to context limitations, or just architecture, SOTA LLMs aren't particularly good at iterating as they tend to loop back around to similar results with bad logic / errors

amatic · a year ago

This sounds amazing! Are there any metrics on how often different models pass tests? Has someone used a similar process to finetune an LLM?

Arubis · a year ago

I'd much sooner accept and commit implementation code written by an AI against unit tests written by a human than the reverse.

benve · a year ago

A very sad future awaits us if a developer's only job is to write tests

awwaiid · a year ago

I knew declarative languages would eventually win!

tiborsaas · a year ago

It's worse, it starts by writing the test so you have to verify if the tests work :)

philote · a year ago

That's what I don't get about this. Instead of writing code that may or may not be correct, it's writing tests that may or may not be correct.

root_axis · a year ago

I've tried these LLM "code from test" things (and vice-versa) dozens of times over the last couple of years... they're not even close to approaching being practical.

colechristensen · a year ago

Why? It will evolve into a slightly higher level language where the compiler is an ML model. Was it a tragedy when developers mostly didn’t have to write assembly any more?

benve · a year ago

I think it's different... I like high level languages, but this is not a programming language, this is a technique for writing tests in an existing language and leaving the implementation to the AI.

I like programming for problem solving, I don't really like writing tests, but that's personal taste, a lot of people like to just use PowerPoint and Jira and tell others what they need to implement, but these people are not software developers.

selcuka · a year ago

> Was it a tragedy when developers mostly didn’t have to write assembly any more?

It wasn't, but for starters compilers have always been generally deterministic.

I'm not saying that this is completely useless (I personally think code completion tools such as GitHub CoPilot are fantastic), but it is still early to compare it to a compiler.

TuringNYC · a year ago

Perhaps a minority opinion, but i LOVE writing tests. I write them before I write the code, it is like playing chess with yourself.

benve · a year ago

I appreciate that your workflow is so linear. I often write tests, then the implementation, then I realize that the tests need to be corrected, then I change the implementation, then I change the tests, then I add other tests etc... etc...

I don't really like maintaining tests, it's often a lot of code that needs to be understood and changed carefully

Deleted Comment

autonomousErwin · a year ago

Really it's just validator code instead of feature code. I think this is the only realistic way forward for production level code written by AI, don't ask it to write code - ask it to pass your validation tests.

Essentially, everyone becomes a red team member trying to think of clever ways they can outwit the AI's code which I for one think this is going to be a lot of fun in the future - though we're still quite a way from there yet!

_flux · a year ago

Maybe in the future developers will be able to write just specifications.

benve · a year ago

this already happens, in many companies mid-level managers write the specifications (ambiguously) and other low-cost people implement them.

And when things don't work they call people like me, to try to understand the performance problems of something poorly defined and worse written.

jacamera · a year ago

That's exactly what we do now.