Readit News logoReadit News
noone_youknow · 2 months ago
I’ve been trying out various LLMs for working on assembly code in my toy OS kernel for a few months now. It’s mostly low-level device setup and bootstrap code, and I’ve found they’re pretty terrible at it generally. They’ll often generate code that won’t quite assemble, they’ll hallucinate details like hardware registers etc, and very often they’ll come up with inefficient code. The LLM attempt at an AP bootstrap (real-mode to long) was almost comical.

All that said, I’ve recently started a RISC-V port, and I’ve found that porting bits of low-level init code from x86 (NASM) to RISC-V (GAS) is actually quite good - I guess because it’s largely a simple translation job and it already has the logic to work from.

simonw · 2 months ago
> They’ll often generate code that won’t quite assemble

Have you tried using a coding agent that can run the compiler itself and fix any errors in a loop?

The first version I got here didn't compile. Firing up Claude Code and letting it debug in a loop fixed that.

noone_youknow · 2 months ago
I have, and to be fair that has solved the “basically incorrect code” issue with reasonable regularity. Occasionally the error messages don’t seem helpful enough for it, which is understandable, and I’ve had a few occurrences of it getting “stuck” in a loop trying to e.g. use an invalid addressing mode (it may have gotten itself out of those situations if I were more patient) but generally, with one of the Claude 4 models in agent mode in cursor or Claude code, I’ve found it’s possible to get reasonably good results in terms of “does it assemble”.

I’m still working on a good way to integrate more feedback for this kind of workflow, e.g. for the attempt it made at AP bootstrap - debugging that is just hard, and giving an agent enough control over the running code and the ability to extract the information it would need to debug the resulting triple fault is an interesting challenge (even if probably not all that generally useful).

I have a bunch of pretty ad-hoc test harnesses and the like that I use for general hosted testing, but that can only get you so far in this kind of low-level code.

vidarh · 2 months ago
Similar experience - they seem to generally have a lot more problems with ASM than structured languages. I don't know if this reflects less training data, or difficulty.
73kl4453dz · 2 months ago
As far as i can tell they have trouble with sustained satisfaction of multiple constraints, and asm has more of that than higher level languages. (An old Boss once said his record for bug density was in asm: he'd written 3 bugs in a single opcode)
msgodel · 2 months ago
The few times I've messed with it I've noticed they're pretty bad at keeping track of registers as they move between subroutines. They're just not great at coming up with a consistent "sub language" the way human assembly programmers tend to.
LtdJorge · 2 months ago
A bit tangential, but I've found 4 Sonnet to be much, much better at SIMD intrinsics (in my case, in Rust) than Sonnet 3.5 and 3.7, which were kind of atrocious. For example, 3.7 would write a scalar for loop and tell you "I've vectorized...", when I explicitly asked to do the operations with x86 intrinsics and gave it the capabilities of the hardware. Also, telling it to use AVX2 as supported would not make it use SSE or it would make conditionals to use them, which makes no sense. Seems Claude 4 solves most of that.

Edit: that -> than

noone_youknow · 2 months ago
This fits my experience. I’m definitely getting considerably better results with 4 than previous Claudes. I’d essentially dropped sonnet from my rotation before 4 became available, but now it’s a go-to for this sort of thing.
userbinator · 2 months ago
I wonder how many demoscene productions it was trained on. Probably not many, because stuff like this sticks out like a sore thumb:

    xor eax, eax            ; z_real = 0 
    xor ebx, ebx            ; z_imag = 0 
    mov ecx, 0              ; iteration counter

01HNNWZ0MV43FF · 2 months ago
I haven't done much asm - I'm guessing that a human would do `xor ecx, ecx` as well?
iamflimflam1 · 2 months ago
Yeah - this is pretty standard code.
varispeed · 2 months ago
It's supposed to be faster than mov ecx, 0.
6mian · 2 months ago
Yes.
johnisgood · 2 months ago
Have you found anything else? If so, could you provide rationale?
rep_lodsb · 2 months ago
Could have done worse:

    loop $

sipsi · 2 months ago
It can also do a pretty good 3d star field: https://godbolt.org/z/a7v4xnbef

First try worked but didn't use correct terminal size.

aargh_aargh · 2 months ago
Tangent: godbolt.org greeted me with a popup but boy, I have never seen a clearer privacy notice, minimal possible data retention, including a diff with the last version. Great job, Matt!
pixelpoet · 2 months ago
Not to be confused with the excellent Mandelbook[0] and related work on the Mandelbrot[1] by Claude Heiland-Allen :)

[0]: https://mathr.co.uk/mandelbrot/book-draft-2017-11-10.pdf

[1]: https://mathr.co.uk/web/mandelbrot.html

0x000xca0xfe · 2 months ago
This is great, thank you!
pixelpoet · 2 months ago
Yeah it's an excellent resource, ditto all of mathr.co.uk :)

If you'd like to join our great little fractal community, here's a Discord invite link: https://discord.gg/beKyJ8HSk5

suddenlybananas · 2 months ago
Googling "Mandelbrot set in assembly" returns a bunch of examples of this.
djaychela · 2 months ago
It does.... I was just surprised that it turned up as terminal output - for some reason I was expecting something in some form of GUI window for some OS or other but I guess that's orders of magnitude more complex and more likely to not work. But he did actually ask for ASCII output, so that does make sense - unlike my assumption!
ale42 · 2 months ago
I think that opening a window and rendering something inside it using the native Win32 API from assembly code on Windows would not be so terrifyingly complex. It's just more code as it needs to call the appropriate GUI APIs (not just syscalls), and it's OS-specific... but such code is anyway always OS-specific (the one mentioned here seems to be for Linux, given the used syscalls). No idea how complex it would be with X or on Mac, as I don't know their low-level GUI APIs.

Dead Comment

horsellama · 2 months ago
OP may want to test this setup here [0]. This is a bit more challenging than replacing a google query with a LLMs pipeline

[0] https://code.golf/mandelbrot#assembly

Jare · 2 months ago
Might be interesting to try this in ARM assembly where it's a lot less likely to be existing code in the training set.
ur-whale · 2 months ago
Mmmmyeah, well, one thing LLM are very decent at is translating, it being from human language to human language or from code to code, so not sure your point stands.
sitkack · 2 months ago
It does fine on Arm assembly (and Neon).
revskill · 2 months ago
Llm is useless in real world codebase. Tons of hallucination and nonsense. Garbagd everywhere. The danger thing is they messed things up rdomly, o consistence at all.

It is fine to treat it as a better autocompletion tool.