Oxidizing Ubuntu: adopting Rust utilities by default

> Of course they are considered problematic in Rust. > And leaks are hard to code by accident in Rust. > ...

I enjoyed this gem and its descendants from the comments. What I see instead, commonly, even in big rust projects, is that it's easy to accidentally define something with a longer lifetime than you intend. Some of that happens accidentally (not recognizing the implications of an extra reference here and there when the language frees things based on static detections of being unused). Much more of it happens because the compiler fights against interesting data structures -- e.g., the pattern of allocating a vec as pseudo-RAM, using indices as pseudo-pointers, and never freeing anything till the container itself is unused.

There's nothing wrong with those techniques per se, but the language tends to paint you into a bit of a corner if you're not very good and very careful, so leaks are a fact of life in basically every major Rust project I've seen not written by somebody like BurntSushi, even when that same sort of project would not have a leak in a GC language.

vlmutolo · a year ago

I think that regardless of what references you have, Rust frees values at the end of their lexical “scope”.

For example, in the linked code below, x is clearly unused past the first line, but its “Drop” implementation executes after the print statement at the end of the function.

The takeaway is that if you want a value to drop early, just explicitly `drop` it. The borrow checker will make sure you don't have any dangling references.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

In general, I think "lifetimes" only exist in the context of the borrow checker and have no influence on the semantics of Rust code. The language was designed so that the borrow checker pass could be omitted and everything would compile and run identically.

steveklabnik · a year ago

This is correct.

mmoskal · a year ago

The vec pattern has some advantages though, in particular you can often get away with 32 bit indices (instead of 64 bit pointers), making it a little more cache-friendly. I did it for regex ASTs that are supposed to be hash-consed, so never need to die until the whole matcher dies [0].

A more contrived example is Earley items in parser that point inside a grammar rule (all rules are in flat vec) and back into previous parser state - so I have 2 u32 offsets. If I had pointers, I would be tempted to have a pointer to grammar rule, index inside of it, and pointer to previous state [1], so probably 3x the space.

In both cases, pointers would be easier but slower. Annoying that Rust doesn't really let you make the choice though...

[0] https://github.com/microsoft/derivre/blob/main/src/hashcons.... [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s...

hansvm · a year ago

By all means. It's a pattern I use all the time, not just in Rust (often getting away with much less than 32-bit indices). You mentioned this and/or alluded to it, but my core complaints are:

1. You don't have much choice in the matter

2. When implementing that strategy, the default (happy path coding) is that you have almost zero choice in when the underlying memory is reclaimed

It's a pattern that doesn't have to be leaky, but especially when you're implementing it in Rust to circumvent borrow-checker limitations, most versions I've seen have been very leaky, not even supporting shrink/resize/reset/... operations to try to at least let the user manually be more careful with leaks.

MeetingsBrowser · a year ago

> not recognizing the implications of an extra reference here and there when the language frees things based on static detections of being unused

Maybe I’m misunderstanding, but isn’t the point of the borrow checker to throw errors if a reference “outlives” the memory it references?

How can an extra reference extend the lifetime?

hansvm · a year ago

Rust (most of the time, I'm not arguing about failure modes at all right now, let's pretend it's perfect) drops items when you're done with them. If you use the item longer, you have a longer implicit lifetime. The memory it references will also be claimed for longer (the reference does not outlive its corresponding memory).

You only fix that by explicitly considering lifetimes as you write your code -- adding in concrete lifetimes and letting the compiler tell you when you make a mistake (very hard to do as a holistic strategy, so nobody does), or just "git gud" (some people do this, but it takes time, and it's not the norm; you can code at a very high level without developing that particular skill subset, with the nearly inevitable result of "leaky" rust code that's otherwise quite nice).

humanfromearth9 · a year ago

Why doesn't anyone rewrite some of these tools in a language that can be compiled to a native binary by GraalVM and benefit of all Java's security guarantees? Would it be too slow? Don't the advantages outweigh the inconvenients?

jeroenhd · a year ago

Why go for Java when you can go for .NET? Or Go? .NET seems to perform on par and seems to produce smaller executable, and Go seems to be faster in general.

Personally I don't really care what language common tools are written in (except for programs written in C(++), but I'll gladly use those after they've had a few years to catch most of the inevitable the memory bugs).

I think the difference is that there aren't many languages with projects that actually end up writing full suite replacements. There's a lot of unexpected complexity hidden within simple binaries that need to be implemented in a compatible way so scripts don't explode at you, and that's pretty tedious work. I know of projects in Rust and Zig that intend to be fully compatible, but I don't know if any in Java+GraalVM or Go. I wouldn't pick Zig for a distro until the language hits 1.0, though.

If these projects do exist, someone could probably compare them all in a compatibility and performance matrix to figure out which distribution is the fastest+smallest, but I suspect Rust may just end up winning in both areas there.

nicoburns · a year ago

Why use Java when you can use Rust? In all seriousness, Rust is a joy to work with for these kind of tools which typically don't have complex lifetimes or ownership semantics.

On top of that you get better performance and probably smaller binaries. But I would pick Rust over Java for CLI tools just on the strengths of the language itself.

dpe82 · a year ago

You could do similar with the existing C by compiling it to WASM and then compiling that to machine code.

neonsunset · a year ago

Because it’s not a good platform for this at all.

KeplerBoy · a year ago

Rust is trendy, Oracle is the opposite.

Of course this isn't a very nuanced or clever take, but certainly part of the truth

jvanderbot · a year ago

The claim was "safe", not GC equivalent or memory minimal. You're right, all things have tradeoffs and borrow checker is no different.

kortilla · a year ago

The claim was “leaks are hard to code by accident”. I agree with gp that this is false.

Preventing leaks is explicitly not a goal of rust and making lifetimes correct often involves giving up and making unnecessary static lifetimes. I see this all the time in async rpc stream stuff.

pjmlp · a year ago

Unless one is writing something where pauses are a no go, even a tiny µs, I don't see a reason for rushing out to affine type systems, or variations thereof.

CLI applications are exactly a case, where it doesn't matter at all, see Inferno and Limbo.

littlestymaar · a year ago

I genuinely don't understand why this is the top comment, as it is almost complete BS and the author confuses the behavior of the borrow checker with the one of a GC (and ironically claim a GC would solve the problem when in fact the problem doesn't exist outside of a GC's world)

hansvm · a year ago

There's no confusion between what a borrow checker and a GC do. The borrow checker enforces safety by adding constraints to the set of valid programs so that it can statically know where to alloc/dealloc, among other benefits. A GC dynamically figures out what memory to drop. My claim is that those Rust constraints force you to write your code differently than you otherwise would (since unsafe is frowned upon, this looks like hand-rolled pointers using vecs as one option fairly frequently), which is sometimes good, but for more interesting data structures it encourages people to write leaks which wouldn't exist if you were to write that same program without those constraints and thus wouldn't exist in a GC language.

Fruitmaniac · a year ago

Memory leaks are easy to code by accident in Java, so it must be even worse in Rust.

mercurial · a year ago

> the pattern of allocating a vec as pseudo-RAM, using indices as pseudo-pointers, and never freeing anything till the container itself is unused

Are you talking about hand-rolled arena allocation? I don´t see how a GC language would have a different behaviour as long as you also use arena allocation and you keep a reachable reference.

> There's nothing wrong with those techniques per se, but the language tends to paint you into a bit of a corner if you're not very good and very careful, so leaks are a fact of life in basically every major Rust project I've seen not written by somebody like BurntSushi

If I take 3 random major Rust projects like Serde, Hyper, and Tracing, none of which are written by BurntSushi, your claim is that they all suffer from memory leaks?

burntsushi · a year ago

And when you put people on a pedestal, they're guaranteed to let you down. :-) https://github.com/BurntSushi/aho-corasick/commit/474393be8d...

I wouldn't be surprised if that style of leak were more prevalent than one would expect. It's pretty subtle. But that link is the only such instance I'm aware of it happening to such a degree in crates I maintain. Maybe there are other instances. This is why I try to use `Box<[T]>` when possible, because you know that can't have extra capacity.

I find the GP's overall argument specious. They lack concrete examples.

> This is not symbolic of any pointed move away from GNU components - it's literally just about replacing coreutils with a more modern equivalent. Sure, the license is different, and it's a consideration, but it's by no means a driver in the decision making.

Sorry, don't believe this one bit. I'm very thankful for everything Canonical/Ubuntu did about 20 years ago, but no thanks. This comes from someone who loves Rust and what it made possible. However, freedoms are too important to not treat anything that looks like an attack on them as such.

imiric · a year ago

Ubuntu lost its way a long time ago. Pushing Snap by default, ads in the Unity Dash search, ads in the terminal... Unity itself was a misstep, and around the time the distro started going downhill for me.

I don't follow it much these days, but nothing really surprises me from Canonical anymore. They leech off free software just like any other corporation.

saghm · a year ago

Whatever happened to the whole debate around Ubuntu including ZFS modules by default in Ubuntu? At the time it originally got proposed, I feel like I remember basically everyone other than Canonical agreeing that this wasn't allowed by the licenses of Linux and ZFS, but they did it anyways, and from what I can tell, they basically got away with it?

Unless I'm remembering it wrong, I'm honestly not surprised that they might just be less worried about licensing in general after that; maybe this is the software licensing equivalent of "too big to fail"?

nosrepa · a year ago

Ubuntu stopped being Ubuntu when it stopped being brown.

queuebert · a year ago

I just realized this today. When trying to upgrade an EOL-ed Ubuntu machine using 'do-release-upgrade', it completely shat the bed. Now I'm in search of an alternative distro that has good GPU support, rolling releases, no systemd. Maybe OpenSUSE?

If there are any SV billionaires out there, can you fund CUDA on OpenBSD please? :-P

sanderjd · a year ago

Why don't you believe it?

From my perspective, it seems imminently plausible. Who cares about the licensing?

But I'm interested in what leads you to the opposite conclusion.

kartoffelmos · a year ago

> Who cares about the licencing.

Well, for one, the author of the Rust based utils cared enough to change it (or rather to not re-adopt GPL, but IMO that's the same ting). Why shouldn't we care about the licencing?

superb_dev · a year ago

Which freedoms are being attacked?

globular-toast · a year ago

The GPL ensures the software will always be free. "Permissive" licences don't. Permissive licences permit anyone to create a derivative version that isn't free. No source code, and it will be under copyright.

If there's one thing people need to understand it's that corporations will always strive to take as much as they can and give back as little as they can. It's incredibly naive to think they won't do it again after that's exactly what every corporation has done for the past several centuries. Give an inch and they'll take a mile. You can't seriously think companies like Microsoft and Oracle aren't already salivating at the thought of the community rewriting GNU/Linux in a permissive licence?

It's not that this particular case is "freedoms being attacked". It's that freedoms are constantly and relentlessly under attack and we can't let our guard down for one moment. GPL is our protection and voluntarily stepping out of it would be suicide.

endgame · a year ago

Remember when Red Hat tried to lock away the source code to subscribers only? I think they would have preferred not to provide source at all, but the enormous base of GPL code entangled with everything make that impractical.

I think we're really going to miss having the fundamental parts of the system available under strong copyleft, and it will be very hard to go back.

Kbelicius · a year ago

User freedoms. Those don't exist under the so called "more permissive" licenses.

immibis · a year ago

I see no problem with Rust. The problem here is the licensing. The new project is proprietary-compatible.

IshKebab · a year ago

So what? There's zero interest in proprietary extensions to coreutils. This isn't Linux or GCC; they're just basic command line utilities.

Also anyone that really cared could use the BSD versions anyway.

Dead Comment

johnny22 · a year ago

i am not that concerned about more permissively licensed versions of what are effectively commodities (like coreutils). I am still very interested in keeping something like the kernel as GPL since it doesn't have a real substitute.

ZoomZoomZoom · a year ago

blueflow · a year ago

The uutils project has the right goals - 1:1 compatibility with GNU coreutils to the extent that any difference in functionality is a bug.

The first comment on LWN is about a bug in the more(1) from uutils. I checked out that code, found some oddities (like doing stat() on a path to check if it exists, right before open()ing it) and went to check against how GNU coreutils does it.

Turns out coreutils does not do it at all, because more(1) is from util-linux. ta-dam.

LeFantome · a year ago

I think this is an excellent point that a lot of people are missing.

There are a lot of GNU utils. For now, Ubuntu is only taking on the ones that are actually in coreutils. Those are the ones that uutils considers "production ready".

secondcoming · a year ago

> > Klode said that it was a bad idea to allow users to select between Rust and non-Rust implementations on a per-command level, as it would make the resulting systems hard to support.

Wouldn't this imply that these aren't actually 1:1 replacements?

zamadatix · a year ago

It would imply not all of the replacements are complete and/or bug free yet, and the article gives examples of just that earlier in the text, but it would not imply the replacements lack the goal of being 1:1 replacements like GP said.

This would seem to be one of the driving factors in the creation of oxidizr: to allow testing how ready the components are to be drop in replacements with easy recourse. You can read more about that in this section of the linked blog post by jnsgruk https://discourse.ubuntu.com/t/carefully-but-purposefully-ox...

  uutils aims to be a drop-in replacement for the GNU utils. Differences with GNU are treated as bugs.

-- https://github.com/uutils/coreutils?tab=readme-ov-file#goals

codeguro · a year ago

>The uutils project has the right goals

Their goals are to _replace GPL's code_. It's a subtle attack on Free Software by providing corporations a workaround so they don't have to abide by the terms of the license.

Nobody seriously doubts the efficiency or safety of the GNU Coreutils. They've been battle tested for over 30 years and whatever safety or optimization risks are left are negligible.

01HNNWZ0MV43FF · a year ago

That's funny because just today I was reading the Rust docs for std::fs::File and saw a warning about that exact TOCTOU error, and had to fix my code to not hit it

Starlevel004 · a year ago

std::fs is generally not a good API with the complete lack of directory handles.

cWave · a year ago

the more you know

samtheprogram · a year ago

Not sure why you are downvoted but I’m curious why. I appreciated this tidbit, thanks.

jgtrosh · a year ago

And? How does that do it?

I checked. It fopen's the file and then fstat's it. So it isn't vulnerable to TOCTOU.

However the TOCTOU is completely benign here. It's just an extra check before Rust opens the file so if you were to try to "exploit" it the only thing that would happen is you get a different error message.

zoogeny · a year ago

I hate to admit it, because I don't particularly like Rust, but I'm slowly coming around to the idea it should replace things.

The major change is a recent comment I made where I was musing about a future where LLMs actually write most of the code. That isn't a foregone conclusion but the recent improvements in coding LLMs suggests it isn't as far off a future as I once considered.

My thought was simple: if I am using a LLM to do the significant portion of generating code, how does that change what I think about the programming language I am using? Does the criteria I use to select the language change?

Upon reflection, Rust is probably the best candidate. It has strictness in ways that other languages do not have. And if the LLM is paying most of the cost of keeping the types and lifetimes in order, what do I care if the syntax is ugly? As long as I can read the output of the LLM and verify it (code review it) then I actually want the strictness. I want the most statically analyzable code possible since I have low trust in the LLM. The fact that Rust is also, ahem, blazingly fast, is icing on the cake.

As an aside to this aside, I was also thinking about modular kernels, like Minix. I wonder if there is a world where we take a userland like the one Ubuntu is trying and apply it to Minix. And then slowly replace the os modules with ones written in Rust. I think the modularity of something like Minix might be an advantage, but that might just be because I am completely naïve.

The rust produced by LLMs is quite bad. It’s overly verbose (misses combinators) and often subtly wrong (swallows errors on result types when it shouldn’t). A single errant collect or clone call can destroy your performance and LLMs sprinkle them for no reason.

Unless you are experienced in rust, you have zero ability to catch the kind of mistakes LLMs make producing rust code.

sroussey · a year ago

I just had an LLM (something 3.7) propose a 100 line implementation and after I stared at things for a while, I reduced it to one. I’m sure I’m in the minority of not just accepting the 100 line addition.

realusername · a year ago

> Unless you are experienced in rust, you have zero ability to catch the kind of mistakes LLMs make producing rust code.

I'd say it's on par with other languages then... LLM are roughly 95% correct on code generation but that's not nearly enough for using them.

And spending time finding which 5% is looking good but actually wrong is a frustrating experience.

Those programs are making different kind of mistakes than humans and I find them much harder to spot.

Diederich · a year ago

Is this deficiency likely to persist long-term as LLMs grow more powerful?

Cloudef · a year ago

LLM truly provides the garbage in, garbage out experience

Deleted Comment

_factor · a year ago

It’s interesting it took you this long to substitute “junior coder” with LLM. The implicit safety is just as applicable to teams of human error prone devs.

Covenant0028 · a year ago

I've had similar thoughts as well. Rust or C would also be the ideal candidate from an energy consumption point of view, as they consume way less energy than Python, which is the language that LLMs most often default to in my experience.

However it's unlikely LLMs will generate a lot of Rust code. All these LLMs are generating is the most likely next token, based on what they've been trained on. And in their training set, it's likely there is massively more Python and JS code out there than Rust, simply because those are way more popular languages. So with Rust, it's more likely to hallucinate and make mistakes than with Python where the path is much better trodden

phanimahesh · a year ago

However it is much easier to statically analyse rust and rust has compile time validations compared to an interpreted language like python. This makes it easier to produce code with write compile fix loop from an agentic llm in rust than python.

In my experience llms are not particularly bad with rust compared to python, though I've only toyed with them minimally.

jdright · a year ago

In practice, it is not what happens. I've been doing AI assisted Rust for some time, and it is very convincing that this is the way. I expect 6mo to 1y to be basically fully automated.

Rust has tons of code out there, and quality code. Different from js or Python that has an abundance of low quality to pure garbage code.

alextingle · a year ago

This is the worst kind of busy-work. Rewriting something for the sake of it is terrible practice.

There's a lot of refreshing energy amongst Rust coders eager to produce better, more modern command-line tools. That's amazing, and a real benefit to everyone. But rewriting simple, decades-old, perfectly functional tools, with the explicit goal of not improving them is just a self-absorbed hobby. And a dangerous one at that - any new code will contain bugs, and the best way to avoid that is not to write code if you don't have to.

It's not for the sake of it - it's to have MIT-licensed (able to be propriotized) coreutils. GPL is a thorn in the side to anyone who wishes to EEE it.

hexo · a year ago

This is far from OK. Entire point is to have GPL not to run away from it. Again canonical shows they didn't get the point. Anyway Im staying on gnu coreutils as i see no benefit and zero reason for switch.

That isn't the motivation. It's about "resilience, performance and maintainability".

I doubt they'll get noticeably better performance (the main GNU tools are already very optimised). I'm not sure they really lack in resilience. And I don't think memory safety is a big factor here.

Maintenance is definitely a big plus though. It will be much nicer to work on modern Rust code than ancient C.

TazeTSchnitzel · a year ago

toybox and the BSDs already exist for those wanting permissively licensed utilities, the GPL is not a major motivator.

panick21_ · a year ago

Yeah because non GPL coreutils don't exist.

Previously:

https://news.ycombinator.com/item?id=43360969

https://news.ycombinator.com/item?id=43353240

johnisgood · a year ago

Amazing, another reason for avoiding Ubuntu.

Many of these utilities have had logic errors in them, you can find them on GitHub issues. sudo, for example, allowed you to bypass through some way I cannot remember.

And I bet you they are not a replacement for GNU utilities, i.e. have less features, and are possibly not optimized either, and perhaps they even have different (or less) flags/options.

I have written a lot of Bash scripts, I wonder how well (if at all) would work on Ubuntu with the Rust utilities.

Almondsetat · a year ago

What an ignorant comment. The uutils README on GitHub explicitly states in the first section that every discrepancy wrt GNU utilities is considered a bug.

It is a comment coming from pragmatism, not ignorance.

At any rate, feel free to look around here: https://github.com/uutils/coreutils/issues?q=is%3Aissue%20st...

It is NOT a replacement of GNU coreutils AT ALL, as of yet.

Granted under "Goals" it says "uutils aims to be a drop-in replacement for the GNU utils. Differences with GNU are treated as bugs.", but please look around the opened (and closed) issues (since they should tell you about the major, yet simple logic bugs that has occurred). It is definitely not ready, and I am not sure when and if it will be ready.

FWIW the README also says: "some options might be missing or different behavior might be experienced.".

Future will tell, but right now, it is an extremely bad idea to replace GNU coreutils with Rust's uutils. Do you think otherwise? Elaborate as to why if so.

lolinder · a year ago

Your first sentence is totally unnecessary and relies on an uncharitable reading of what they said. It's entirely possible that they knew that a goal of the project was to treat every discrepancy as a bug and were indicating that they were of the opinion that there were plenty of bugs left and no evidence that they would ever be all squashed.

Also, while we're slinging the README around, here's the first paragraph:

> uutils coreutils is a cross-platform reimplementation of the GNU coreutils in Rust. While all programs have been implemented, some options might be missing or different behavior might be experienced.

The "ignorant comment" above was actually just pointing out the thing that the developers thought was second-most-important for people to know—right after the fact that it's a reimplementation.

yawaramin · a year ago

> uutils aims to be a drop-in replacement for the GNU utils. Differences with GNU are treated as bugs.

> some options might be missing or different behavior might be experienced.

Both of these statements cannot be true at the same time.

bravetraveler · a year ago

Canonical making light of licensing again, no surprise here.

This has ulterior motive written all over it. Best outcome: nothing happens.