Readit News logoReadit News
stephbook · 8 hours ago
> It’s one of those things that crackpots keep trying to do, no matter how much you tell them it could never work. If the spec defines precisely what a program will do, with enough detail that it can be used to generate the program itself, this just begs the question: how do you write the spec? Such a complete spec is just as hard to write as the underlying computer program, because just as many details have to be answered by spec writer as the programmer.

Joel Spolsky, stackoverflow.com founder, Talk at Yale: Part 1 of 3 https://www.joelonsoftware.com/2007/12/03/talk-at-yale-part-...

sethev · 8 hours ago
Program generation from a spec meant something vastly different in 2007 than it does now. People can and are generating programs from underspecified prompts. Trying to be systematic about how prompts work is a worthwhile area to explore.
ModernMech · an hour ago
Isn't that how it always goes? First its just the crackpots. Then it's a fringe. Soon it's the way things have always been done.
slcjordan · 8 hours ago
It actually makes sense that code is becoming amorphous and we will no longer scale in terms of building out new features (which has become cheap), but by defining stricter and stricter behavior constraints and structural invariants.
embedding-shape · 7 hours ago
Yeah, "what you're able to build" is no longer one of the most important things, "what you won't build" just became a lot more important.
bitwize · 2 hours ago
People literally specifying software into existence in 2026 gives this quote a vibe of "aerodynamically speaking, bumblebees cannot fly".
James_K · 7 hours ago

  import Mathlib
  def Goldbach := ∀ x : ℕ, Even x → x > 2 → ∃ (y z: ℕ), Nat.Prime y ∧ Nat.Prime z ∧ x = y + z
A short specification for the proof of the Goldbach conjecture in Lean. Much harder to implement though. Implementation details are always hidden by the interface, which makes it easier to specify than produce. The Curry-Howard correspondence means that Joel's position here is that any question is as hard to ask as answer, and any statement as hard to formulate as it is to prove, which is really just saying that all describable statements are true.

TiredOfLife · 4 hours ago
> The horse is here to stay, but the automobile is only a novelty — a fad.

Advice given to Henry Ford’s lawyer, Horace Rackam, by an unnamed president of Michigan Savings Bank in 1903.

CamperBob2 · 8 hours ago
What he misses is that it's much easier to change the spec than the code. And if the cost of regenerating the code is low enough, then the code is not worth talking about.
mwarkentin · 8 hours ago
Is it? If the spec is as detailed as the code would be? If you make a change to one part of the spec do you now have inconsistencies that the LLM is going to have to resolve in some way? Are we going to have a compiler, or type checker type tools for the spec to catch these errors sooner?

Deleted Comment

lifis · 13 hours ago
As far as I can tell it's not a new language, but rather an alternative workflow for LLM-based development along with a tool that implements it.

The idea, IIUC, seems to be that instead of directly telling an LLM agent how to change the code, you keep markdown "spec" files describing what the code does and then the "codespeak" tool runs a diff on the spec files and tells the agent to make those changes; then you check the code and commit both updated specs and code.

It has the advantage that the prompts are all saved along with the source rather than lost, and in a format that lets you also look at the whole current specification.

The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it (and also can't do LLM-driven changes that refer to the actual code), and also that in general it's not guaranteed that the spec actually reflects all important things about the program, so the code does also potentially contain "source" information (for example, maybe your want the background of a GUI to be white and it is so because the LLM happened to choose that, but it's not written in the spec).

The latter can maybe be mitigated by doing multiple generations and checking them all, but that multiplies LLM and verification costs.

Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.

souvlakee · 13 hours ago
As far as I can tell C is not a new language, but rather an alternative workflow for assembly development along with a tool that implements it.
abreslav · 13 hours ago
I second that :)
abreslav · 13 hours ago
> The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it

Eventually, we'll end up in a world where humans don't need to touch code, but we are not there yet. We are looking into ways to "catch up" the specs with whatever changes happen in the code not through CodeSpeak (agents or manual changes or whatever). It's an interesting exercise. In the case of agents, it's very helpful to look at the prompts users gave them (we are experimenting with inspecting the sessions from ~/.claude).

More generally, `codespeak takeover` [1] is a tool to convert code into specs, and we are teaching it to take prompts from agent sessions into account. Seems very helpful, actually.

I think it's a valid use case to start something in vibe coding mode and then switch to CodeSpeak if you want long-term maintainability. From "sprint mode" to "marathon mode", so to speak

[1] https://codespeak.dev/blog/codespeak-takeover-20260223

newsoftheday · 13 hours ago
> Eventually, we'll end up in a world where humans don't need to touch code, but we are not there yet.

Will we though? Wouldn't AI need to reach a stage where it is a tool, like a compiler, which is 100% deterministic?

ferguess_k · 9 hours ago
Why are we eliminating our own job and maybe hobby so eagerly? Whatever. It is done.
WASDx · 8 hours ago
I think these limitations could be addressed by allowing trivial manual adjustments to the generated code before committing. And/or allowing for trivial code changes without a spec change. The judgement of "trivial" being that it still follows the spec and does not add functionality mandating a spec change. I haven't checked if they support any of this but I would be frustrated not being allowed to make such a small code change, say to fix an off-by-one error that I recently got from LLM output. The code change would be smaller than the spec change.

Cool idea overall, an incremental psuedocode compiler. Interesting to see how well it scales.

I can also see a hybrid solution with non-specced code files for things where the size of code and spec would be the same, like for enums or mapping tables.

lifis · 13 hours ago
Also they seem to want to run this as a business, which seems absurd to me since I don't see how they can possibly charge money, and anyway the idea is so simple that it can be reimplemented in less than a week (less than a day for a basic version) and those alternative implementations may turn out to be better.

It also seems to be closed-source, which means that unless they open the source very soon it will very likely be immediately replaced in popularity by an open source version if it turns out to gain traction.

boznz · 10 hours ago
Also a bit formal. Maybe something like this will be the output of the prompt to let me know what the AI is going to generate in the binary, but I doubt I will be writing code like this in 5 years, English will probably be fine at my level.
abreslav · 13 hours ago
> Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.

Working on that as well. We need to be a lot more flexible and configurable

Deleted Comment

the_duke · 14 hours ago
This doesn't make too much sense to me.

* This isn't a language, it's some tooling to map specs to code and re-generate

* Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

* Models are evolving rapidly, this months flavour of Codex/Sonnet/etc would very likely generate different code from last months

* Text specifications are always under-specified, lossy and tend to gloss over a huge amount of details that the code has to make concrete - this is fine in a small example, but in a larger code base?

* Every non-trivial codebase would be made up of of hundreds of specs that interact and influence each other - very hard (and context - heavy) to read all specs that impact functionality and keep it coherent

I do think there are opportunities in this space, but what I'd like to see is:

* write text specifications

* model transforms text into a *formal* specification

* then the formal spec is translated into code which can be verified against the spec

2 and three could be merged into one if there were practical/popular languages that also support verification, in the vain of ADA/Spark.

But you can also get there by generating tests from the formal specification that validate the implementation.

onion2k · 13 hours ago
Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

sensanaty · 11 hours ago
That if at the beginning of your sentence is doing a whole lot of work. Indeed, if we could formally and provably (another extremely loaded word) generate good code that'd be one thing, but proving correctness is one of those basically impossible tasks.
dsr_ · 13 hours ago
Let's rephrase:

Since nobody involved actually cares whether the code works or not, it doesn't matter whether it's a different wrong thing each time.

tomtomtom777 · 12 hours ago
> If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

If the spec is so complete that it covers everything, you might as well write the code.

The benefit of writing a spec and having the LLM code it, is that the LLM will fill in a lot of blanks. And it is this filling in of blanks that is non-deterministic.

SpaceNoodled · 13 hours ago
That's a huge "if."
FrankRay78 · 12 hours ago
Sure, but where are the formal acceptance tests to validate against?
0-_-0 · 11 hours ago
Besides, you can deterministically generate bad code, and not deterministically generate good code.
__loam · 13 hours ago
The code is what the code does.

Dead Comment

jrm4 · 13 hours ago
I would be very comfortable with - re-run 100 times with different seeds. If the outcome is the same every time, you're reliably good to go.
pron · 12 hours ago
If what you're after is determinism, then your solution doesn't offer it. Both the formal specification and the code generated from it would be different each time. Formal specifications are useful when they're succinct, which is possible when they specify at a higher level of abstraction than code, which admits many different implemementations.
vidarh · 11 hours ago
The point would presumably be to formalise it, then verify that the formal version matches what you actually meant. At which point you can't/shouldn't regenerate it, but you can request changes (which you'd need to verify and approve).
davedx · 13 hours ago
My process has organically evolved towards something similar but less strictly defined:

- I bootstrap AGENTS.md with my basic way of working and occasionally one or two project specific pieces

- I then write a DESIGN.md. How detailed or well specified it is varies from project to project: the other day I wrote a very complete DESIGN.md for a time tracking, invoice management and accounting system I wanted for my freelance biz. Because it was quite complete, the agent almost one-shot the whole thing

- I often also write a TECHNICAL-SPEC.md of some kind. Again how detailed varies.

- Finally I link to those two from the AGENTS. I also usually put in AGENTS that the agent should maintain the docs and keep them in sync with newer decisions I make along the way.

This system works well for me, but it's still very ad hoc and definitely doesn't follow any kind of formally defined spec standard. And I don't think it should, really? IMO, technically strict specs should be in your automated tests not your design docs.

the_duke · 13 hours ago
I think many have adopted "spec driven development" in the way you describe.

I found it works very well in once-off scenarios, but the specs often drift from the implementation. Even if you let the model update the spec at the end, the next few work items will make parts of it obsolete.

Maybe that's exactly the goal that "codespeak" is trying to solve, but I'm skeptical this will work well without more formal specifications in the mix.

jbonatakis · 13 hours ago
I have been building this in my free time and it might be relevant to you: https://github.com/jbonatakis/blackbird

I have the same basic workflow as you outlined, then I feed the docs into blackbird, which generates a structured plan with task and sub tasks. Then you can have it execute tasks in dependency order, with options to pause for review after each task or an automated review when all child task for a given parents are complete.

It’s definitely still got some rough edges but it has been working pretty well for me.

rebolek · 13 hours ago
AGENTS.md is nice but I still need to remind models that it exists and they should read it and not reinvent the wheel every time.
DrJokepu · 13 hours ago
> Models aren't deterministic

Is that really true? I haven’t tried to do my own inference since the first Llama models came out years ago, but I am pretty sure it was deterministic: if you fixed the seed and the input was the same, the output of the inference was always exactly the same.

bigwheels · 13 hours ago
LLMs are not deterministic:

1.) There is typically a temperature setting (even when not exposed, most major providers have stopped exposing it [esp in the TUIs]).

2.) Then, even with the temperature set to 0, it will be almost deterministic but you'll still observe small variations due to the limited precision of float numbers.

Edit: thanks for the corrections

dworks · 8 hours ago
>I do think there are opportunities in this space, but what I'd like to see is:

>* write text specifications

>* model transforms text into a formal specification

>* then the formal spec is translated into code which can be verified against the spec

This skill does just that: https://github.com/doubleuuser/rlm-workflow

Each stage produces its own output artifact (analysis, implementation plan, implementation summary, etc) and takes the previous phases' outputs as input. The artifact is locked after the stage is done, so there is no drift.

wenc · 10 hours ago
Rehashing my comment from before:

I use Kiro IDE (≠ Kiro CLI) primarily as a spec generator. In my experience, it's high-quality for creating and iterating on specs. Tools like Cursor are optimized for human-driven vibing -- they have great autocomplete, etc. Kiro, by contrast, is optimized around spec, which ironically has been the most effective approach I've found for driving agents.

I'd argue that Cursor, Antigravity, and similar tools are optimized for human steering, which explains their popularity, while Kiro is optimized for agent harnesses. That's also why it’s underused: it's quite opinionated, but very effective. Vibe-coding culture isn't sold on spec driven development (they think it's waterfall and summarily dismiss it -- even Yegge has this bias), so people tend to underrate it.

Kiro writes specs using structured formats like EARS and INCOSE (which is the spc format used in places like Boeing for engineering reqs). It performs automated reasoning to check for consistency, then generates a design document and task list from the spec -- similar to what Beads does. I usually spend a significant amount of time pressure-testing the spec before implementing (often hours to days), and it pays off. Writing a good, consistent spec is essentially the computer equivalent of "writing as a tool of thought" in practice.

Once the spec is tight, implementation tends to follow it closely. Kiro also generates property-based tests (PBTs) using Hypothesis in Python, inspired by Haskell's QuickCheck. These tests sweep the input domain and, when combined with traditional scenario-based unit tests, tend to produce code that adheres closely to the spec. I also add a small instruction "do red/green TDD" (I learned this from Simon Willison) and that one line alone improved the quality of all my tests. Kiro can technically implement the task list itself, but this is where agents come in. With the spec in hand, I use multiple headless CLI agents in tmux (e.g., Kiro CLI, Claude Code) for implementation. The results have been very good. With a solid Kiro spec and task list, agents usually implement everything end-to-end without stopping -- I haven’t found a need for Ralph loops. (agents sometimes tend to stop mid way on Claude plans, but I've never had that happen with Kiro, not sure why, maybe it's the checklist, which includes PBT tests as gates).

didn't have the strongest start, but the Kiro IDE is one of the best spec generators I've used, and it integrates extremely well with agent-driven workflows.

abreslav · 8 hours ago
> * model transforms text into a formal specification

formal specification is no different from code: it will have bugs :)

There's no free lunch here: the informal-to-formal transition (be it words-to-code or words-to-formal-spec) comes through the non-deterministic models, period.

If we want to use the immense power of LLMs, we need to figure out a way to make this transition good enough

rco8786 · 12 hours ago
How is your 2 step process not susceptible to all the exact same pitfalls you listed above?
jnpnj · 11 hours ago
Maybe we're entering the non-deterministic applications too. No more mechanical predictable thing.. more like 90% regular and then weird.

Slightly sarcastic but not sure this couldn't become a thing.

dist-epoch · 12 hours ago
> Models aren't deterministic - every time you would try to re-apply you'd likely get different output

So like when you give the same spec to 2 different programmers.

rco8786 · 12 hours ago
Yes, if you had each programmer rewrite the code from scratch each time you updated the spec.
kennywinker · 12 hours ago
Except each time you compile your spec you’re re-writing it from scratch with a different programmer.

Deleted Comment

pessimizer · 13 hours ago
I think your objections miss the point. My informal specs to a program are user-focused. I want to dictate what benefits the program will give to the person who is using it, which may include requirements for a transport layer, a philosophy of user interaction, or any number of things. When I know what I want out of a program, I go through the agony of translating that into a spec with database schemas, menu options, specific encryption schemes, etc., then finally I turn that into a formal spec within which whether I use an underscore or a dash somewhere becomes a thing that has to be consistent throughout the document.

You're telling me that I should be doing the agonizing parts in order for the LLM to do the routine part (transforming a description of a program into a formal description of a program.) Your list of things that "make no sense" are exactly the things that I want the LLMs to do. I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

I want to see specs that move away from describing the specific functionality of programs altogether, and more into describing a usefulness or the convenience of a program that doesn't exist. I want to be able to feed the LLM requirements of what I want a program to be able to accomplish, and let the LLM research and implement the how. I only want to have to describe constraints i.e. it must enable me to be able to do A, B, and C, it must prevent X,Y, and Z; I want it to feel free to solve those constraints in the way it sees fit; and when I find myself unsatisfied with the output, I'll deliver it more constraints and ask it to regenerate.

darkwater · 13 hours ago
> I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

Be careful what you wish for. This sounds great in theory but in practice it will probably mean a migration path for the users (UX changes, small details changed, cost dynamics and a large etc.)

jbritton · 9 hours ago
I tried this recently with what I thought was a simple layout, but probably uncommon for CSS. It took an extremely long back and forth to nail it down. It seemingly had no understanding how to achieve what I wanted. A couple sentences would have been clear to a person. Sometimes LLMs are fantastic and sometimes they are brain dead.

Dead Comment

fnord77 · 13 hours ago
[delete]
koolala · 12 hours ago
It isn't a formal language, look at the goose example:

https://codespeak.dev/blog/greenfield-project-tutorial-20260...

It is a formal "way" aka like using json or xml like tons of people are already doing.

dist-epoch · 12 hours ago
Software products specifications are written in real language, not in first order logic.

Dead Comment

oofbaroomf · 9 hours ago
Ugh, I just wish there was a deterministic and formal way to tell a computer what I want...
tonipotato · 14 hours ago
The problem with formal prompting languages is they assume the bottleneck is ambiguity in the prompt. In my experience building agents, the bottleneck is actually the model's context understanding. Same precise prompt, wildly different results depending on what else is in the context window. Formalizing the prompt doesn't help if the model builds the wrong internal representation of your codebase. That said curious to see where this goes.
slfnflctd · 13 hours ago
Two pieces of advice I keep seeing over & over in these discussions-- 1) start with a fresh/baseline context regularly, and 2) give agents unix-like tools and files which can be interacted with via simple pseudo-English commands such as bash, where they can invoke e.g. "--help" to learn how to use them.

I'm not sure adding a more formal language interface makes sense, as these models are optimized for conversational fluency. It makes more sense to me for them to be given instructions for using more formal interfaces as needed.

le-mark · 13 hours ago
This concept is assuming a formalized language would make things easier somehow for an llm. That’s making some big assumptions about the neuro anatomy if llms. This [1] from the other day suggests surprising things about how llms are internally structured; specifically that encoding and decoding are distinct phases with other stuff in between. Suggesting language once trained isn’t that important.

[1] https://news.ycombinator.com/item?id=47322887

abreslav · 13 hours ago
We are not trying to make things easier for LLMs. LLMs will be fine. CodeSpeak is built for humans, because we benefit from some structure, knowing how to express what we want, etc.
etothet · 12 hours ago
Under "Prerequisites"[0] I see: "Get an Anthropic API key".

I presume this is temporary since the project is still in alpha, but I'm curious why this requires use of an API at all and what's special about it that it can't leverage injecting the prompt into a Claude Code or other LLM coding tool session.

[0]: https://codespeak.dev/blog/greenfield-project-tutorial-20260...

cube2222 · 9 hours ago
This is actually... pretty cool?

Definitely won't use it for prod ofc but may try it out for a side-project.

It seems that this is more or less:

  - instead of modules, write specs for your modules
  - on the first go it generates the code (which you review)
  - later, diffs in the spec are translated into diffs in the code (the code is *not* fully regenerated)
this actually sounds pretty usable, esp. if someone likes writing. And wherever you want to dive deep, you can delve down into the code and do "microoptimizations" by rolling something on your own (with what seems to be called here "mixed projects").

That said, not sure if I need a separate tool for this, tbh. Instead of just having markdown files and telling cause to see the md diff and adjust the code accordingly.

abreslav · 9 hours ago
We'd love to hear your feedback! Feel free to come to our discord to ask questions/share experience: https://l.codespeak.dev/discord