matklad (u/matklad) - Readit News

matklad commented on Make.ts matklad.github.io/2026/01... · Posted by u/ingve

mcapodici · 16 days ago

If you want it to be an alternative to shell history then ~/make.ts is better, since that'll be the same wherever you are.

matklad · 16 days ago

Thanks, I haven't considered this! My history is usually naturally project-scoped, but I bet I'll find ~/make.ts useful now that I have it!

matklad commented on Parsing Advances matklad.github.io/2025/12... · Posted by u/birdculture

dcrazy · a month ago

I’m curious why the author chose to model this as an assertion stack. The developer must still remember to consume the assertion within the loop. Could the original example not be rewritten more simply as:

    const result: ast.Expression[] = [];
    p.expect("(");
    while (!p.eof() && !p.at(")")) {
     subexpr = expression(p);
     assert(p !== undefined); // << here
     result.push(subexpr);
     if (!p.at(")")) p.expect(",");
    }
    p.expect(")");
    return result;

matklad · a month ago

I assume you ment to write `assert(subexpression != undefined)`?

This is resilient parsing --- we are parsing source code with syntax errors, but still want to produce a best-effort syntax tree. Although expression is required by the grammar, the `expression` function might still return nothing if the user typed some garbage there instead of a valid expression.

However, even if we return nothing due to garbage, there are two possible behaviors:

* We can consume no tokens, making a guess that what looks like "garbage" from the perspective of expression parser is actually a start of next larger syntax construct:

``` function f() { let x = foo(1, let not_garbage = 92; } ```

In this example, it would be smart to _not_ consume `let` when parsing `foo(`'s arglist.

* Alternatively, we can consume some tokens, guessing that the user _meant_ to write an expression there

``` function f() { let x = foo(1, /); } ```

In the above example, it would be smart to skip over `/`.

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

nickmonad · a month ago

Hey matklad! Thanks for hanging out here and commenting on the post. I was hoping you guys would see this and give some feedback based on your work in TigerBeetle.

You mentioned, "E.g., in OP, memory is leaked on allocation failures." - Can you clarify a bit more about what you mean there?

matklad · a month ago

In

    const recv_buffers = try ByteArrayPool.init(gpa, config.connections_max, recv_size);
    const send_buffers = try ByteArrayPool.init(gpa, config.connections_max, send_size);

if the second try throws, than the memory allocation created by the first try is leaked. Possible fixes:

A) clean up individual allocations on failure:

    const recv_buffers = try ByteArrayPool.init(gpa, config.connections_max, recv_size);
    errdefer recv_buffers.deinit(gpa);

    const send_buffers = try ByteArrayPool.init(gpa, config.connections_max, send_size);
    errdefer send_buffers.deinit(gpa);

B) ask the caller to pass in an arena instead of gpa to do bulk cleanup (types & code stays the same, but naming & contract changes):

    const recv_buffers = try ByteArrayPool.init(arena, config.connections_max, recv_size);
    const send_buffers = try ByteArrayPool.init(arena, config.connections_max, send_size);

C) declare OOMs to be fatal errors

    const recv_buffers = ByteArrayPool.init(gpa, config.connections_max, recv_size) catch |err| oom(err);
    const send_buffers = ByteArrayPool.init(gpa, config.connections_max, send_size) catch |err| oom(err);

    fn oom(_: error.OutOfMemory) noreturn { @panic("oom"); }

You might also be interesting in https://matklad.github.io/2025/12/23/static-allocation-compi..., it's essentially a complimentary article to what @MatthiasPortzel says here https://news.ycombinator.com/item?id=46423691

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

pron · a month ago

That what happens can be reasoned about in the semantics of the source language as opposed to being UB doesn't necessarily make the problem "a ton more benign". After all, a program written in Assembly has no UB and all of its behaviours can be reasoned about in the source language, but I'd hardly trust Assembly programs to be more secure than C programs [1]. What makes the difference isn't that it's UB but, as you pointed out, the type safety. But while the less deterministic nature of a "malloc-level" UAF does make it more "explosive", it can also make it harder to exploit reliably. It's hard to compare the danger of a less likely RCE with a more likely data leak.

On the other hand, the more empirical, though qualitative, claim made by by matklad in the sibling comment may have something to it.

[1]: In fact, take any C program with UB, compile it, and get a dangerous executable. Now disassemble the executable, and you get an equally dangerous program, yet it doesn't have any UB. UB is problematic, of course, partly because at least in C and C++ it can be hard to spot, but it doesn't, in itself, necessarily make a bug more dangerous. If you look at MITRE's top 25 most dangerous software weaknesses, the top four (in the 2025 list) aren't related to UB in any language (by the way, UAF is #7).

matklad · a month ago

>If you look at MITRE's top 25 most dangerous software weaknesses, the top four (in the 2025 list) aren't related to UB in any language (by the way, UAF is #7).

FWIW, I don't find this argument logically sound, in context. This is data aggregated across programming languages, so it could simultaneously be true that, conditioned on using memory unsafe language, you should worry mostly about UB, while, at the same time, UB doesn't matter much in the grand scheme of things, because hardly anyone is using memory-unsafe programming languages.

There were reports from Apple, Google, Microsoft and Mozilla about vulnerabilities in browsers/OS (so, C++ stuff), and I think there UB hovered at between 50% and 80% of all security issues?

And the present discussion does seem overall conditioned on using a manually-memory-managed language :0)

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

kibwen · a month ago

Those guidelines are quite clear that they're written specifically in the context of the C programming language, and may not make sense in other contexts:

"For fairly pragmatic reasons, then, our coding rules primarily target C and attempt to optimize our ability to more thoroughly check the reliability of critical applications written in C."

A version of this document targeting, say, Ada would look quite different.

matklad · a month ago

They do make a lot of sense in other contexts :-) From the actual rules, only #2 (minimize preprocessor) and #10 (compiler warnings) are C specific. Everything else is more-or-less universally applicable.

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

pron · a month ago

> Forcing function to avoid use-after-free

Doesn't reusing memory effectively allow for use-after-free, only at the progam level (even with a borrow checker)?

matklad · a month ago

There's some reshuffling of bugs for sure, but, from my experience, there's also a very noticeable reduction! It seems there's no law of conservation of bugs.

I would say the main effect here is that global allocator often leads to ad-hoc, "shotgun" resource management all other the place, and that's hard to get right in a manually memory managed language. Most Zig code that deals with allocators has resource management bugs (including TigerBeetle's own code at times! Shoutout to https://github.com/radarroark/xit as the only code base I've seen so far where finding such bug wasn't trivial). E.g., in OP, memory is leaked on allocation failures.

But if you manage resources manually, you just can't do that, you are forced to centralize the codepaths that deal with resource acquisition and release, and that drastically reduces the amount of bug prone code. You _could_ apply the same philosophy to allocating code, but static allocation _forces_ you to do that.

The secondary effect is that you tend to just more explicitly think about resources, and more proactively assert application-level invariants. A good example here would be compaction code, which juggles a bunch of blocks, and each block's lifetime is tracked both externally:

* https://github.com/tigerbeetle/tigerbeetle/blob/0baa07d3bee7...

and internally:

* https://github.com/tigerbeetle/tigerbeetle/blob/0baa07d3bee7...

with a bunch of assertions all other the place to triple check that each block is accounted for and is where it is expected to be

https://github.com/tigerbeetle/tigerbeetle/blob/0baa07d3bee7...

I see a weak connection with proofs here. When you are coding with static resources, you generally have to make informal "proofs" that you actually have the resource you are planning to use, and these proofs are materialized as a web of interlocking asserts, and the web works only when it is correct in whole. With global allocation, you can always materialize fresh resources out of thin air, so nothing forces you to do such web-of-proofs.

To more explicitly set the context here: the fact that this works for TigerBeetle of course doesn't mean that this generalizes, _but_, given that we had a disproportionate amount of bugs in small amount of gpa-using code we have, makes me think that there's something more here than just TB's house style.

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

nromiun · a month ago

> All memory must be statically allocated at startup.

But why? If you do that you are just taking memory away from other processes. Is there any significant speed improvement over just dynamic allocation?

matklad · a month ago

See https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TI... for motivation.

- Operational predictability --- latencies stay put, the risk of threshing is reduced (_other_ applications on the box can still misbehave, but you are probably using a dedicated box for a key database)

- Forcing function to avoid use-after-free. Zig doesn't have a borrow checker, so you need something else in its place. Static allocation is a large part of TigerBeetle's something else.

- Forcing function to ensure existence of application-level limits. This is tricky to explain, but static allocation is a _consequence_ of everything else being limited. And having everything limited helps ensure smooth operations when the load approaches deployment limit.

- Code simplification. Surprisingly, static allocation is just easier than dynamic. It has the same "anti-soup-of-pointers" property as Rust's borrow checker.

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

leumassuehtam · a month ago

> All memory must be statically allocated at startup. No memory may be dynamically allocated (or freed and reallocated) after initialization. This avoids unpredictable behavior that can significantly affect performance, and avoids use-after-free. As a second-order effect, it is our experience that this also makes for more efficient, simpler designs that are more performant and easier to maintain and reason about, compared to designs that do not consider all possible memory usage patterns upfront as part of the design. > TigerStyle

It's baffling that a technique known for 30+ years in the industry have been repackage into "tiger style" or whatever this guru-esque thing this is.

matklad · a month ago

To add more context, TigerStyle is quite a bit more than just static allocation, and it indeed explicitly attributes earlier work:

> NASA's Power of Ten — Rules for Developing Safety Critical Code will change the way you code forever. To expand:

* https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TI...

* https://spinroot.com/gerard/pdf/P10.pdf

matklad commented on Static Allocation with Zig nickmonad.blog/2025/stati... · Posted by u/todsacerdoti

MatthiasPortzel · a month ago

One key thing to understand about TigerBeetle is that it's a file-system-backed database. Static allocation means they limit the number of resources in memory at once (number of connections, number of records that can be returned from a single query, etc). One of the points is that these things are limited in practice anyways (MySQL and Postgres have a simultaneous connection limit, applications should implement pagination). Thinking about and specifying these limits up front is better than having operations time out or OOM. On the other hand, TigerBeetle does not impose any limit on the amount of data that can be stored in the database.

=> https://tigerbeetle.com/blog/2022-10-12-a-database-without-d...

It's always bad to use O(N) memory if you don't have to. With a FS-backed database, you don't have to. (Whether you're using static allocation or not. I work on a Ruby web-app, and we avoid loading N records into memory at once, using fixed-sized batches instead.) Doing allocation up front is just a very nice way of ensuring you've thought about those limits, and making sure you don't slip up, and avoiding the runtime cost of allocations.

This is totally different from OP's situation, where they're implementing an in-memory database. This means that 1) they've had to impose a limit on the number of kv-pairs they store, and 2) they're paying the cost for all kv-pairs at startup. This is only acceptable if you know you have a fixed upper bound on the number of kv-pairs to store.

matklad · a month ago

Yes, very good point, thanks!

As a tiny nit, TigerBeetle isn't _file system_ backed database, we intentionally limit ourselves to a single "file", and can work with a raw block device or partition, without file system involvement.

matklad commented on TigerBeetle as a File Storage aivarsk.com/2025/12/07/ti... · Posted by u/aivarsk

WJW · 2 months ago

Tigerbeetle is very cool and I would love to see more of it. AFAIR they have been hinting that you could in theory plug in storage engines different from the debit/credit model they've using for some time. Has any of this materialized? I would love to use it but just don't have any bookkeeping to do at the scale where bringing in Tigerbeetle would make sense. :(

matklad · 2 months ago

It is the other way around --- it is _relatively_ easy to re-use the storage engine, but plug your custom state machine (implemented in Zig). We have two state machines, an accounting one, and a simple echo one here: https://github.com/tigerbeetle/tigerbeetle/blob/main/src/tes....

I am not aware of any "serious" state machine other than accounting one though.