Not sure what your analogy is trying to imply.
Deleted Comment
Not sure what your analogy is trying to imply.
Deleted Comment
The smoltcp crate typically uses runtime checks to ensure slice accesses made by the library do not cause a panic. It's not exactly equivalent to GP's assertion, since it doesn't cover "every single slice access", but it at least covers slice accesses triggered by the library's public API. (i.e. none of the public API functions should cause a panic, assuming that the runtime validation after the most recent mutation succeeds).
Example: https://docs.rs/smoltcp/latest/src/smoltcp/wire/ipv4.rs.html...
There are also other ways to convey 12-hour time. e.g. 朝6時に起きる (wake up at 6 A.M. / wake up at 6 in the morning).
Deleted Comment
% curl https://xslt.rip/
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/index.xsl" type="text/xsl"?>
<html>
<head>
<title>XSLT.RIP</title>
</head>
<body>
<h1>If you're reading this, XSLT was killed by Google.</h1>
<p>Thoughts and prayers.</p>
<p>Rest in peace.</p>
</body>
</html>I think this doesn't have to be like this and we can do better for this. If LLMs keep this up, good testing infrastructure might become more important.
Overall, I agree with your point. LLMs feel a lot more reliable when a codebase has thorough, easy-to-run tests. For a similar reason, I have been drifting towards strong, statically-typed languages. Both Rust and TypeScript have rich type systems that can express many kinds of runtime behavior with just types. When a compiler can make strong guarantees about a program's behavior, I assume that helps nudge the quality of LLM output a bit higher. Tests then help prevent silly regressions from occurring. I have no evidence for this besides my anecdotal experience using LLMs across several programming languages.
In general, I've had the best experience with LLMs when there's plenty of static analysis (and tests) on the codebase. When a codebase can't be easily tested, then I get much less productivity gains from LLMs. So yeah, I'm all for improving testing infrastructure.
You are doing embedded development or anything else not as mainstream as web dev? LLMs are still useful but no longer mind blowing and often produce hallucinations. You need to read every line of their output. 1000t/s is crazy but no longer always in a good way.
You are doing stuff which the LLMs haven't seen yet? You are on your own. There is quite a bit of irony in the fact that the devs of llama.cpp barely use AI - just have a look at the development of support for Qwen3-Next-80B [1].
I experienced this with Claude 4 Sonnet and, to some extent, gpt-5-mini-high.
When able to run tests against its output, Claude produces pretty good Rust backend and TypeScript frontend code. However, Claude became borderline unproductive once I started experimenting with uefi-rs. Other LLMs, like gpt-5-mini-high, did not fare much better, but they were at least capable of admitting lack of knowledge. In particular, GPT-5 would provide output akin to "here is some pseudocode that you may be able to adapt to your choice of UEFI bindings".
Testing in a UEFI environment is quite difficult; the LLM can't just run `cargo test` and verify its output. Things get worse in embedded, because crates like embedded_hal made massive API changes between 0.2 and 1.0 (the latest version), and each LLM I've tried seems to only have knowledge of 0.2 releases. Also, for embedded, forget even thinking about testing harnesses (which at least exist in some form with UEFI, it's just difficult to automate the execution and output for an LLM). In this case, you cannot really trust the output of the LLM. To minimize risk of hallucination, I would try maintaining data sheets and library code in context, but at that point, it took more time to prompt an LLM than handwrite code.
I've been writing a lot of embedded Rust over the past two weeks, and my usage of LLMs in general decreased because of that. Currently planning to resume development on some of my "easier" projects, since I have about 300 Claude prompts remaining in my Zed subscription, and I don't want them to go to waste.
Not defending the design, but this website is sometimes useful for disambiguating OSStatus error codes: https://www.osstatus.com/