Rust for Filesystems - Readit News

pornel · a year ago

I don't get how can each file system have a custom lifecycle for inodes, but still use the same functions for inode lifecycle management, but apparently with different semantics? That sounds like the opposite of an abstraction layer, if the same function must be used in different ways depending on implementation details.

If the lifecycle of inodes is filesystem-specific, it should be managed via filesystem-specific functions.

phkahler · a year ago

>> I don't get how can each file system have a custom lifecycle for inodes, but still use the same functions for inode lifecycle management, but apparently with different semantics?

I had the same question. They're trying to understand (or even document) all the C APIs in order to do the rust work. It sounds like collecting all that information might lead to some [WTFs and] refactoring so questions like this don't come up in the first place, and that would be a good thing.

seanhunter · a year ago

If you haven't seen it before, you might find this useful https://www.kernel.org/doc/html/latest/filesystems/vfs.html

It's an overview of the VFS layer, which is how they do all the filesystem-specific stuff while maintaining a consistent interface from the kernel.

sandywaffles · a year ago

I understood it as they're working to abstract as much as is generally and widely possible in the VFS layer, but there will still be (many?) edge cases that don't fit and will need to be handled in FS-specific layers. Perhaps the inode lifecycle was just an initial starting point for discussion?

crest · a year ago

I assume it's supposed to work by having the compiler track the lifetime of the inodes. The compiler is expected to help with ephemeral references (the file system still has to store the link count to disk).

DSMan195276 · a year ago

> but still use the same functions for inode lifecycle management

I'm not an expert by any means but I'm somewhat knowledgeable, there's different functions that can be used to create inodes and then insert them into the cache. `iget_locked()` that's focused on here is a particular pattern of doing it, but not every FS uses that for one reason or another (or doesn't use it in every situation). Ex: FAT doesn't use it because the inode numbers get made-up and the FS maintains its own mapping of FAT position to inodes. There's then also file systems like `proc` which never cache their inode objects (I'm pretty sure that's the case, I don't claim to understand proc :P )

The inode objects themselves still have the same state flow regardless of where they come from, AFAIK, so from a consumer perspective the usage of the `inode` doesn't change. It's only the creation and internal handling of the inode objects by the FS layer that depends based on what the FS needs.

Deleted Comment

gwbas1c · a year ago

Maybe they are asking the wrong questions?

Does Rust need to change to make it easier to call C?

I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C. (I'm sure someone reading this has done it.) In contrast, in C++ and Objective C, all you need to do is include the right header and call the function. Swift lets you include Objective C files, and you can call C from them.

Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

lambda · a year ago

Calling C from Rust can be quite simple. You just declare the external function and call it. For example, straight out of the Rust book https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#usin... :

  extern "C" {
      fn abs(input: i32) -> i32;
  }

  fn main() {
      unsafe {
          println!("Absolute value of -3 according to C: {}", abs(-3));
      }
  }

Now, if you have a complex library and don't want to write all of the declarations by hand, you can use a tool like bindgen to automatically generate those extern declarations from a C header file: https://github.com/rust-lang/rust-bindgen

There's an argument to be made that something like bindgen could be included in Rust, not requiring a third party dependency and setting up build.rs to invoke it, but that's not really the issue at hand in this article.

The issue is not the low-level bindings, but higher level wrappers that are more idiomatic in Rust. There's no way you're going to be able to have a general tool that can automatically do that from arbitrary C code.

frankjr · a year ago

There's also cbindgen for going the other way around. https://github.com/mozilla/cbindgen

jacobgorm · a year ago

Passing integers around is easy, sharing structs or strings and context pointers for use in callbacks crossing the language barrier etc is typically much harder.

varjag · a year ago

That's not really "simple", it's on par with C FFI in about any other language (except C++), with same drawbacks.

tupshin · a year ago

This is not a notable challenge in rust, nor relevant to the article.

The article is about finding ways of using rust to actually implement kernel fs drivers/etc. Note that any rust code in the kernel is necessarily consuming C interfaces.

Bindgen works quite well for the use case that you are thinking.

https://github.com/rust-lang/rust-bindgen

moomin · a year ago

Yeah, the Rust proponents are being significantly more ambitious. Not just the ability to code a file system in Rust, but do it in a way that catches a lot of the correctness issues relating to the complex (and changing) semantics of FS development.

duped · a year ago

It's actually pretty easy. All you need is declare `extern "C" fn foo() -> T` to be able to call it from Rust, and to pass the link flags either by adding a #[link] attribute or by adding it in a build.rs.

You can use the bindgen crate to generate bindings ahead of time, or in a build.rs and include!() the generated bindings.

Normally what people do is create a `-sys` crate that contains only bindings, usually generated. Then their code can `use` the bindings from the sys crate as normal.

> in contrast, in C++ and Objective C, all you need to do is include the right header

and link against the library.

Smaug123 · a year ago

The point is that Rust can model invariants that C can't. You can call both ways, but if C is incapable of expressing what Rust can, that has important implications for the design of APIs which must be common to both.

gwbas1c · a year ago

That's not how I interpreted it: There is a clear need to be able to write filesystems in Rust, and the kernel developer(s) who write the filesystem API don't want to have to maintain the bindings to Rust.

kelnos · a year ago

> Does Rust need to change to make it easier to call C?

No, because it's already dirt-simple to do. You just declare the C function as 'extern "C"', and then call it. (You will often need to use 'unsafe' and convert or cast references to raw pointers, but that's simple syntax as well.)

There are tools (bindgen being the most used) that can scan C header files and produce the declarations for you, so you don't have to manually copy/paste and type them yourself.

> Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

I think you maybe misunderstood the article? There's nothing wrong with the language here. The argument is around how Rust should be used. The Rust-for-Linux developers want to encode semantics into their API calls, using Rust's features and type system, to make these calls safer and less error-prone to use. The people on the C side are afraid that doing so will make it harder for them to evolve the behavior and semantics of their C APIs, because then the Rust APIs will need to be updated as well, and they don't want to sign up for that work.

An alternative that might be more palatable is to not make use of Rust features and the type system in order to encode semantics into the Rust API. That way, it will be easier for C developers, as updating Rust API when C API changes will be mechanical and simple to do. But then we might wonder what the point is of all this Rust work if the Rust-for-Linux developers can't use Rust some features to make better, safer APIs.

> I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C.

Kinda weird that you currently have the top-voted comment when you admit you don't understand the language well enough to have an informed opinion on the topic at hand.

codetrotter · a year ago

I’ve written Rust code that called C++

It wasn’t completely straightforward, but on the whole I figured out everything I needed to within a few days in order to be able to do it.

Calling C would surely be very similar.

emporas · a year ago

If you like to see some examples of C bindings:

https://github.com/tree-sitter/tree-sitter/blob/25c718918084...

sandywaffles · a year ago

I wasn't clear and am not familiar enough with the Linux FS systems to know if this Rust API would be wrapping or re-implementing the C APIs? If it's re-implementing (or rather an additional API) it seems keeping the names the same as the C API would be problematic and lead to more confusion over time, even if initially it helped already-familiar-developers grok whats going on faster.

CGamesPlay · a year ago

> Almeida put up a slide with the equivalent of iget_locked() in Rust, which was called get_or_create_inode().

Seems like the answer is that it's reimplementing and doesn't use the same names.

swfsql · a year ago

I'm not familiar with those functions, but I had the impression they actually shouldn't have the same name.

Since the Rust function has implicit/automatic behavior depending on how it's state is and how it's used by the callsite, and since the C one doesn't have any implicit/automatic behavior (as in, separate/explicit lifecycle calls must be made "manually"), I don't even see the reason for them to have the same name.

That is to say, having the same name would be somehow wrong since the functions do and serve for different stuff.

But it would make sense, at least from the Rust site, to have documentation referring to the original C name.

BiteCode_dev · a year ago

Given how those discussions usually go, and the scale of the change, I find that discussion extraordinarily civil.

I disagree with the negative tone of this thread, I'm quite optimistic given how clearly the parties involved were able to communicate the pain points with zero BS.

nickparker · a year ago

I found myself reading this more for the excellent notetaking than for the content.

I suspect the discussion was about as charged, meandering, and nitpicky as we all expect a PL debate among deeply opinionated geeks to be, and Jake Edge (who wrote this summary) is exceptionally good at removing all that and writing down substance.

BiteCode_dev · a year ago

Certainly.

We are talking about extremely competent people who worked on a critical piece of software for years and invested a lot of their lives in it, with all pain, effort, experience, and responsibilities that come with that.

That this debate is inscribed is a process that is still ongoing, and in fact, progressing, is a testament to how healthy the situation is.

I was expecting the whole Rust thing to be shut down 10 times, in a flow of distasteful remarks, already.

This means that not only Rust is vindicated as promising for the job, but both teams are willing and up to the task of working on the integration.

Those projects are exhausting, highly under-pressure situations, and they last a long time.

I still find that the report is showing a positive outcome. What do people expect? Move fast and break things?

A barrage of "no" is how it's supposed to go.

Deleted Comment

hu3 · a year ago

Some of the comments below the lwn.net page are rather disrespectful.

Imagine getting this comment about the open source project you contribute to:

"Science advances one funeral at a time"

Deleted Comment

simon04 · a year ago

tl;dr?

ysw0145 · a year ago

Having more options available in the Linux kernel is always beneficial. However, Rust may not be the solution for everything. While Rust does its best to ensure its programming model is safe, it is still a limited model. Memory issues? Use Rust! Concurrency problems? Switch to Rust! But you can't do everything that C does without using unsafe blocks. Rust can offer a fresh perspective to these problems, but it's not a complete solution.

tialaramex · a year ago

> But you can't do everything that C does without using unsafe blocks

For this particular work the huge benefit of Rust is its enthusiasm for encapsulating such safety problems in types. Which is indeed what this article is about.

C and particularly the way C is used in the kernel makes it everybody's responsibility to have total knowledge of the tacit rules. That cannot scale. A room full of kernel developers didn't entirely agree on the rules for a data structure they all use!

Rust is very good at making you aware of rules you need to know, and making it not your problem when it can be somebody else's problem to ensure rules are followed. Sometimes the result will be less optimal, but even in the Linux kernel sub-optimal is often the right default and we can provide an (unsafe) escape hatch for people who can afford to learn six more weird rules to maybe get better performance.

mjburgess · a year ago

> That cannot scale.

lol... you're talking about the linux kernel, written in C.

The vast majority of software over many decades "bottoms out" in C whether in VMs, operating systems, device drivers, etc.

The scale of the success of C is unparalleled.

drdo · a year ago

But unsafe blocks are available! And you should use them when you have to, but only when you have to.

Using an unsafe block with a very limited blast radius doesn't negate all the guarantees you get in all the rest of your code.

sanxiyn · a year ago

Note that unsafe blocks don't have limited blast radius. Blast that can be caused by a single incorrect unsafe block is unlimited, at least in theory. (In practice there could be correlation of amount of incorrectness to effect, but same also could be said about C undefined behavior.)

Unsafe blocks limit amount you need to get correct, but you need to get all of them correct. It is not a blast limiter.

CraigJPerry · a year ago

I’d word that different- it reduces the search space for a bug when something goes wrong but it doesn’t limit the blast radius - you can still spectacularly blow up safe rust code with an unsafe block (that no aliases rule is seriously tough to adhere to!)

This is definitely a strong benefit though.

Deleted Comment

bigstrat2003 · a year ago

> But you can't do everything that C does without using unsafe blocks. Rust can offer a fresh perspective to these problems, but it's not a complete solution.

It's true that you need to have unsafe code to do low level things. But it's a misconception that if you have to use unsafe then Rust isn't a good fit. The point of the safe/unsafe dichotomy in Rust is to clearly mark which bits of the code are unsafe, so that you can focus all your attention on auditing those small pieces and have confidence that everything else will work if you get those bits right.

pjc50 · a year ago

> But you can't do everything that C does without using unsafe blocks

How much of this is actually 100% unambiguously necessary? Is there a good reason why anything in the filesystem code at all needs to be unsafe?

I suspect it's a very small subset needed in a few places.

nicce · a year ago

Usually avoidance of copying or moving data is the primary reason. In filesystems, this is quite highlighted.

bilekas · a year ago

> Concurrency problems?

I have to admit, while I do enjoy rust in the sense that it makes sense and can really "click" sometimes. For anything asynchronous I find it really rough around the edges. It's not intuitive what's happening under the hood.

the_duke · a year ago

Async != concurrency.

One of the major wins of Rust is encoding thread safety in the type system with the `Send` and `Sync` traits.

wongarsu · a year ago

Rust async isn't all that pleasant to use. On the other hand for normal threaded concurrency Rust is one of the best languages around. The type system prevents a lot of concurrency bugs. "Effortless concurrency" is a tagline the language really has earned.

asyx · a year ago

I really hate async rust. It's really great that rust forces you on a compiler level to use mutexes but async is a disease that is spreading through your whole project and introduces a lot of complexity that I don't feel in C#, Python or JS/TS.

Deleted Comment