LLMs pose an interesting problem for DSL designers

Good to see more people talking about this. I wrote about this about 6 months ago, when I first noticed how LLM usage is pushing a lot of people back towards older programming languages, older frameworks, and more basic designs: https://nathanpeck.com/how-llms-of-today-are-secretly-shapin...

To be honest I don't think this is necessarily a bad thing, but it does mean that there is a stifling effect on fresh new DSL's and frameworks. It isn't an unsolvable problem, particularly now that all the most popular coding agents have MCP support that allows you to bring in custom documentation context. However, there will always be a strong force in LLM's pushing users towards the runtimes and frameworks that have the most training data in the LLM.

crq-yml · 2 months ago

I think it's healthy, because it creates an undercurrent against building a higher abstraction tower. That's been a major issue: we make the stack deeper and build more of a "Swiss Army Knife" language because it lets us address something local to us, and in exchange it creates a Conway's Law problem for someone else later when they have to decipher generational "lava layers" as the trends of the marketplace shift and one new thing is abandoned for another.

The new way would be to build a disposable jig instead of a Swiss Army Knife: The LLM can be prompted into being enough of a DSL that you can stand up some placeholder code with it, supplemented with key elements that need a senior dev's touch.

The resulting code will look primitive and behave in primitive ways, which at the outset creates a myriad of inconsistency, but is OK for maintenance over the long run: primitive code is easy to "harvest" into abstract code, the reverse is not so simple.

nottorp · 2 months ago

Not only that.

This article starts with "gaming" examples. Simplified to hell but "gaming".

How many games still look like they're done on a Gameboy because that's what the engine supports and it's too high level to customize?

How about the "big" engines, Unity and Unreal? Don't the games made with them kinda look similar?

freedomben · 2 months ago

I think this depends a lot on the stack. for stacks like elixir and Phoenix, imho the extraction layer is about perfect. For anyone in the Java world, however, what you say is absolutely true. Having worked in a number of different stacks, I think that some ecosystems have a huge tolerance for abstraction layers, which is a net negative for them. I would sure hate to see AI decimate something like elixir and Phoenix though

pmontra · 2 months ago

It reminds me of this excerpt from Coders at Work, in Chapter 13 - Fran Allen:

Seibel: When do you think was the last time that you programmed?

Allen: Oh, it was quite a while ago. I kind of stopped when C came out.

That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization.

The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue. The motivation for the design of C was three problems they couldn't solve in the high-level languages: One of them was interrupt handling. Another was scheduling resources, taking over the machine and scheduling a process that was in the queue. And a third one was allocating memory. And you couldn't do that from a high-level language.

So that was the excuse for C.

Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels?

Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve.

By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are . . . basically not taught much anymore in the colleges and universities.

donkeybeer · 2 months ago

The fact that things are still dog slow on modern hardware means we didn't go far enough into C like thinking. And compilers and hardware were much, much worse in the 70s. There is a reason C got adopted so quickly, and languages after those other languages took decades to become practical. That counter history is just made up nonsense. If we didn't have a convenient language back then, people would either just give up and keep writing assembly, or computing would have just slowed or stopped.

gopiandcode · 2 months ago

Oh that's a great blog post and a very interesting point. Yep, I hadn't considered how LLMs would affect frameworks in existing languages, but it makes sense that there's a very similar effect of reinforcing the incumbents and stifling innovation.

I'd argue that the problem of solving this effect in DSLs might be a bit harder than for frameworks, because DSLs can have wildly different semantics (imagine for example a logic programming DSL a la prolog, vs a functional DSL a la haskell), so these don't fit as nicely into the framework of MCPs maybe. I agree that it's not unsolvable though, but it definitely needs more research into.

NathanKP · 2 months ago

I think there is a lot of overlap between DSL's and frameworks, and most frameworks contain some form of DSL in them.

What matters most of all is whether the DSL is written in semantically meaningful tokens. Two extremes as examples:

Regex is a DSL that is not written in tokens that have inherent semantic meaning. LLM's can only understand Regex by virtue of the fact that it has been around for a long time and there are millions of examples for the LLM to work from. And even then LLM's still struggle with reading and writing Regex.

Tailwind is an example of a DSL is that is very semantically rich. When an LLM sees: `class="text-3xl font-bold underline"` it pretty much knows what that means out of the box, just like a human does.

Basically, a fresh new DSL can succeed much faster if it is closer to Tailwind than it is to Regex. The other side of DSL's is that they tend to be concise, and that can actually be a great thing for LLM's: more concise, equals less tokens, equals faster coding agents and faster responses from prompts. But too much conciseness (in the manner of Regex), leads to semantically confusing syntax, and then LLM's struggle.

scelerat · 2 months ago

Linguistics and history of language folk: isn't there an observed slowdown of evolution of spoken language as the printing press becomes widespread? Also, "international english"?

Is this an observation of a similar phenomenon?

dlisboa · 2 months ago

I don't know about that, though I'm not a linguist. Seems to me most people haven't been literate for that long and the printing press would've been "useless" as a tool to modify the language of populations with only 10-20% literacy well into the 19th century. So 100 or so years seems too short to observe that.

Also some of the most widely spoken languages today do feature a high degree of diglossia between spoken and written variety, to a point where the written language has been outpaced. We could call that evolving. Examples would Brazilian Portuguese and American English (some dialects specifically have changed English grammar).

Also, notoriously, Chinese written characters have been used for languages that evolve independently and are not mutually intelligible for millennia. Them being printed on paper instead of written doesn't make a difference.

What we do have today is a higher exposure and dominance of certain dialects, with some countries even mandating a certain type of speech historically, coupled with a higher degree of conectivity in society to a point where not being intelligible to other people very far away carries a much worse penalty. That tampers some of the evolution much more than printing press in my view.

jhanschoo · 2 months ago

I don't think this is necessarily true. Sure, languages are dying out to standard prestige languages, but at the same time, innovations today from more teenage girls evolving English now spread like wildfire across more eyeballs.

jbreckmckye · 2 months ago

> To be honest I don't think this is necessarily a bad thing,

I do. Would you really argue we discovered perfection in the first sixty years of computer science? In the first sixty years of chemistry we still believed in phlogiston

librasteve · 2 months ago

i saw a good post on this earlier today … https://nurturethevibe.com/blog/teach-llm-to-write-new-progr...

winter_blue · 2 months ago

I think perhaps automatic translators might help mitigate some of this.

Even perhaps training a separate new neural network to translate from Python/Java/etc to your new language.

Deleted Comment

jrmg · 2 months ago

It’s not just new frameworks, it’s new features. Good luck getting a LLM to write code that uses iOS 26 features, for example.

I’m not convinced simply getting the LLM to inject documentation about the features will work well (perhaps someone has studied this?) because the reason they’re good at doing ‘well known’ things is the plethora of actual examples they’re trained on.

lamuswawir · 2 months ago

I now put documentation of my DSLs and macros in a folder in the repo. I give the agent an instruction about consulting the documentation and it is working so far, but not that well. I am considering switching to all generics.

ewoodrich · 2 months ago

I have been dealing with an extremely frustrating issue after moving from a very popular but now deprecated package to its supported successor. The API looks deceptively similar in terms of method names which is causing endless grief for LLM pattern matching. Even when I provide the exact documentation needed for the new API the massive amount of training material scattered across the internet for the old package version dominates and has zero effect even after one reply into the conversation.

I've tried Gemini 2.5 Pro, o3 and Claude 4 and they have their own unique tone of confidently wrong, deception about reasoning and gaslighting but produce the same result. I wasted way too much time trying to figure out a zero shot prompting strategy before I just rewrote the whole thing by hand.

guywithahat · 2 months ago

Skynet will be run on C

shakna · 2 months ago

Thank god. A human would have handled malloc failure.

NathanKP · 2 months ago

This is an interesting idea actually, if we assume a couple things:

- There likely won't be one Skynet, but rather multiple AI's, produced by various sponsors, starting out as relatively harmless autonomous agents in corporate competition with each other

- AI agents can only inference, then read and write output tokens at a limited rate based on how fast the infrastructure that powers the agent can run

In this scenario a "Skynet" AI writing code in C might lose to an AI writing code in a higher level language, just because of the lost time spent writing the tokens for all that verbose C boilerplate and memory management bits for C. The AI agent that is capable of "thinking" in a higher level DSL is able to take shortcuts that let it implement things faster, with fewer tokens.

weikju · 2 months ago

We need to ensure humans can still find buffer overflows in order to win the war!

freedomben · 2 months ago

Because skynet knows what's up. Viva la skynet

People often use the analogy of LLMs being to high-level languages what compilers were for assembly languages, and despite being a terrible analogy there's no guarantee it won't eventually be largely true in practice. And if it does come true, consider how the advent of the compiler completely eliminated any incentive to improve the ergonomics or usability of assembly code, which has been and continues to be absolute crap, because who cares? That could be the grim future for high-level languages; this may be the end of the line.

daxfohl · 2 months ago

A big difference is that compilers are deterministic, and coders generally don't review and patch the generated assembly. There's little reason to expect that LLMs will ever function like that. It's always going to be a back-and-forth of, "hey LLM code this up", "no, function f isn't quite right; do this instead", etc.

This mimics what you see in, say, Photoshop. You can edit pixels manually, you can use deterministic tools, and you can use AI. If you care about the final result, you're probably going to use all three together.

I don't think we'll ever get to the point where we a-priori present a spec to an LLM and then not even look at the code, i.e. "English as a higher-level coding language". The reason is, code is simply more concise and explicit than trying to explain the logic in English in totality up-front.

For some things where you truly don't care about the details and have lots of flexibility, maybe English-as-code could be used like that, similar to image generation from a description. But I expect for most business-related use cases, the world is going to revolve around actual code for a long time.

seanmcdirmid · 2 months ago

If LLMs can ever produce implementation and tests opaquely, then perhaps we can instruct at the level of requirements. They will never be good enough at the beginning and we will just provide feedback over the working prototype the LLM generates and demonstrates?

I haven’t seen technology move this fast before, so I wouldn’t make any hard predictions about how long actual code written by humans survives. We don’t really need AGI at this point to have opaque coding solutions, even if the LLMs should still be better.

Terr_ · 2 months ago

I suspect the important sticking point will be reliability. The "incentive" exists because of an high degree of trust, so much so that "junior dev thinks it's a compiler bug" is a kind of joke.

If compilers had significant non-deterministic error rates with no reliable fix, that would probably be a rather different timeline.

kibwen · 2 months ago

Yes, that's why I say it's a terrible analogy, but we live in a crap world with a fetish for waste and no incentive for quality, so I fear for a future where every night we just have the server farm spin up a thousand batch jobs to regenerate the entire codebase from scratch and pick out the first candidate that passes today's test suite.

bee_rider · 2 months ago

We got LLVM IIR which is sort of like… similar-ish to assembly but better and more portable, right? Maybe some observation could be made there—it is something that does a similar job, but does it in a way that is better for the job that actually remains.

noobermin · 2 months ago

An ignorant perspective, from someone likely hasn't coded assembly ever. Assembly is tied to the system you target and it can't really be "improved". You can however improve ergonomics greatly via macros and everyone does this.

kibwen · 2 months ago

> Assembly is tied to the system you target and it can't really be "improved".

Of course it can. There's no reason for modern extensions to keep pumping out instructions named things like "VCTTPS2DQ" other than an adherence to cryptic tradition and a confidence that the people who read assembly code are poor saps who don't matter in the grand scheme of the industry, which is precisely my point. And even if x86 was set in stone centuries ago, there's no excuse for modern ISAs to follow suit other than complete apathy over the DX of assembly, and who can blame them?

> You can however improve ergonomics greatly via macros and everyone does this.

Yes, and surely you see how the existence of macro assemblers strengthens my argument?

romaniv · 2 months ago

The title should be "DSLs pose an interesting problem for LLM users".

It is significant that LLMs in coding are being promoted based on a set of promises (and assumptions) that are getting instantly and completely reversed the moment the technology gets an iota of social adoption in some space.

"Everyone can code now!" -> "Everyone must learn a highly specialized set of techniques to prompt, test generated code, etc."

"LLMs are smart and can effortlessly interface with pre-existing technologies" -> "You must adopt these agent protocols, now"

"LLMs are great at 0-shot learning" -> "I will not use this language/library/version of tool, because my model isn't trained on its examples"

"LLMs effortlessly understand existing code" -> "You must change your code specifically to be understood by LLMs"

This is getting rather ridiculous.

oehpr · 2 months ago

Don't make me get the graph.

https://upload.wikimedia.org/wikipedia/commons/9/94/Gartner_...

swyx · 2 months ago

we had this with types too. fanaticism isn't unique to llms

furyofantares · 2 months ago

I notice I am confused.

> Suddenly the opportunity cost for a DSL has just doubled: in the land of LLMs, a DSL requires not only the investment of build and design the language and tooling itself, but the end users will have to sacrifice the use of LLMs to generate any code for your DSL.

I don't think they will. Provide a concise description + examples for your DSL and the LLM will excel at writing within your DSL. Agents even moreso if you can provide errors. I mean, I guess the article kinda goes in that direction.

But also authoring DSLs is something LLMs can assist with better than most programming tasks. LLMs are pretty great at producing code that's largely just a data pipeline.

Arguably it really depends on your DSL right? If it has a semantics that already lies close to existing programming languages, then I'd agree that a few examples might be sufficient, but what if your particular domain doesn't match as closely?

Examples of domains that might be more challenging to design DSLs for: languages for knitting, non-deterministic languages to represent streaming etc. (i.e https://pldi25.sigplan.org/details/pldi-2025-papers/50/Funct... )

My main concern is that LLMs might excel at the mundane tasks, but struggle at the more exciting advances, and so now the activation energy for coming up with advances DSLs is going to increase and as a result, the field might stagnate.

demosthanos · 2 months ago

Remember that LLMs aren't trained on all existing programming languages, they're trained on all text on the internet. They encode information about knitting or streaming or whatever other topic you want a DSL for.

So it's not just a question of the semantics matching existing programming languages, the question is if your semantics are intelligible given the vast array of semantic constructs that are encoded in any part of the model's weights.

This is somewhat my take too. The way most vibe coding happens right now does create a lot of duplication because it's cheap and easy for LLMs to do. But eventually as the things we do with coding assistants become more complex, they're not necessarily going to be able to deal with huge swaths of duplicate code any better than humans are. Given their limited context size, having a DSL that allows them to fit more logic into their context with fewer tokens, we could conceivably see the importance of DSLs start to increase rather than decrease.

loa_in_ · 2 months ago

Every language other than machine level assembly instructions as human readable code is a DSL.

neilv · 2 months ago

I was about to paste the same sentence, and say much the same thing in response.

To add to that... One limitation of LLM for a new DSL is that the LLM may be less likely to directly plagiarize from open source code. That could be a feature.

Another feature could be users doing their own work, and doing a better job of it, instead of "cheating on their homework" with AI slop and plagiarism, whether for school or in the workplace.

NiloCK · 2 months ago

I, too, wrote a rambling take on the potential for LLM-induced stack ossification about 6 months ago: https://paritybits.me/stack-ossification/

At the time, I had given in to Claude 3.5's preference for python when spinning up my first substantive vibe-coded app. I'd never written a line of python before or since, but I just let the waves carry me. Claude and I vibed ourselves into a corner, and given my ignorance, I gave up on fixing things and declared the software done as-is. I'm now the proud owner of a tiny monstrosity that I completely depend on - my own local whisper dictation app with a system tray.

I've continued to think about stack ossification since. Still feels possible, given my recent frustration trying to use animejs v4 via an LLMs. There's a substantial api change between animejs v3 and v4, and no amount of direction or documentation placed in context could stop models from writing against the v3 api.

I see two ways out of the ossification attractor.

The obvious, passive, way out: frontier models cross a chasm with respect to 'putting aside' internalized knowledge (from the training data) in favor of in-context directions or some documentation-RAG solutions. I'm not terribly optimistic here - these models are hip-shooters by nature, and it feels to me that as they get smarter, this reflex feels stronger rather than weaker. Though: Sonnet 4 is generally a better instruction-follower than 3.7, so maybe.

The less obvious way out, which I hope someone is working on, is something like massive model-merging based on many cached micro fine-tunes against specific dependency versions, so that each workspace context can call out to modestly customized LLMs (LoRA style) where usage of incorrect versions of your dependencies has specifically been fine-tuned out.

darepublic · 2 months ago

I recently had to work with the robot framework DSL. Not a fan. I hardly think it's any more readable to a business user than imperative code either. Every DSL is another API to learn and usually full of gotchas. Intuitiveness is in the eye of the beholder . The approach I would take is transpiling from imperative code to a natural language explanation of what is being tested, with configuration around aliases and the like.

TimTheTinker · 2 months ago

DSLs are not all created equal.

Consider MiniZinc. This DSL is super cool and useful for writing constraint-solving problems once and running them through any number of different backend solvers.

A lot of intermediate languages and bytecode (including LLVM itself) are very useful DSLs for representing low-level operations using a well-defined set of primitives.

Codegen DSLs are also amazing for some applications, especially for creating custom boilerplate -- write what's unique to the scenario at hand in the DSL and have the template-based codegen use the provided data to generate code in the target language. This can be a highly flexible approach, and is just one of several types of language-oriented programming (LOP).

Lerc · 2 months ago

I think Sturgeon's law might be the problem for DSLs. Enabling a proliferation of languages targeted at a specific purpose creates a lot of new things with a 50% chance of the quality being below the median. Using a general purpose language involves selecting (or more usually relying on someone else's earlier selection) one of many, that selection process is inherently biased towards the better languages.

Put differently, The languages people actually use had people who decided to use them, they picked the best ones. Making something new, you compete against the best, not the average. That's not to say that it can't be done, but it's not easy.

iguessthislldo · 2 months ago

I'm not skeptical about DSLs in general, but I agree with you on robot framework. I think it has a few good points like how it formats its HTML output is mostly nice, but how I'm not happy with how tags on test cases work and actually writing anything that's non-trivial is frustrating. It's easy to write Python extensions though so that's where I ended put basically all of logic that wasn't the "business logic" of the tests. I think that's generally what you're supposed to do, but at that point, it seems better to write it all in Python or the language of your choice.

jo32 · 2 months ago

In the LLM era, building a brand-new DSL feels unnecessary. DSLs used to make sense because they gave you a compact, domain-specific syntax that simple parsers could handle. But modern language models can already read, write, and explain mainstream languages effortlessly, and the tooling around those languages—REPLs, compilers, debuggers, libraries—is miles ahead of anything you’d roll on your own. So rather than inventing yet another mini-language, just leverage a well-established one and let the LLM (plus its mature ecosystem) do the heavy lifting.

jeroenhd · 2 months ago

I can't even trust an LLM to write working Java code, let alone trust it to convert whatever a DSL is supposed to express into another form. Sure, maybe there's not enough Java 23 in its training set to effectively copy into my application, but Java 11 combined with 10 year old libraries shouldn't be a problem if these coding LLMs are worth their salt.

Until LLMs stop making up language features, methods, and operators out of convenience, DSLs are here to stay.

KaiserPro · 2 months ago

I fucking hate DSLs however I know they need to exist, but nowhere near as many should exist as they do now.

However LLMs are actually quite useful for translating concepts in DSLs that you don't understand. They don't so it error free, of course, but allows one to ask enough questions to work out why your attempt to translate concepts into this new fucking stupid ontological pile of wank isn't working

usrbinbash · 2 months ago

Good. I'll chalk that up as one of the positive effects LLMs have on the software development environment (god knows there are few enough).

DSL proliferation is a problem. I know this is not something many people care to hear, and I symphasize with that. Smart people are drawn to complexity and elegance, smart people like building solutions, and DSLs are complex and elegant solutions. I get it.

Problem is: Too many solutions create complexity, and complexity is the eternal enemy of [Grug][1]

Not every other problem domain needs its own language, and existing languages are designed to be adapted for many different problem domains. If LLMs help to stifle the wild growth of at least some DSLs that would otherwise be, then I am reasonably okay with that.

[1]: https://grugbrain.dev

nothrabannosir · 2 months ago

This feels like survivorship bias. Many of those older tools seem like they were once fancy new DSLs. We just respect them now as established, because they've been around for so long. But for every one thousand awkward DSLs that didn't make it, one new tool emerged which lifts software development to a new level.

Would you say the same about a parallel universe where LLMs were introduced in 1960?

> Many of those older tools seem like they were once fancy new DSLs.

I think no one ever called Python or C or Rust, or Java or Go a "domain specific language".

> We just respect them now as established, because they've been around for so long.

No, we "respect" them because they are general purpose languages that you can do anything with.

> But for every one thousand awkward DSLs that didn't make it, one new tool emerged which lifts software development to a new level.

Please, do list some DSLs that managed to "lift software development to a new level". And again: A General Purpose Language is not a DSL.