I don't get how can each file system have a custom lifecycle for inodes, but still use the same functions for inode lifecycle management, but apparently with different semantics? That sounds like the opposite of an abstraction layer, if the same function must be used in different ways depending on implementation details.
If the lifecycle of inodes is filesystem-specific, it should be managed via filesystem-specific functions.
>> I don't get how can each file system have a custom lifecycle for inodes, but still use the same functions for inode lifecycle management, but apparently with different semantics?
I had the same question. They're trying to understand (or even document) all the C APIs in order to do the rust work. It sounds like collecting all that information might lead to some [WTFs and] refactoring so questions like this don't come up in the first place, and that would be a good thing.
I understood it as they're working to abstract as much as is generally and widely possible in the VFS layer, but there will still be (many?) edge cases that don't fit and will need to be handled in FS-specific layers. Perhaps the inode lifecycle was just an initial starting point for discussion?
I assume it's supposed to work by having the compiler track the lifetime of the inodes. The compiler is expected to help with ephemeral references (the file system still has to store the link count to disk).
> but still use the same functions for inode lifecycle management
I'm not an expert by any means but I'm somewhat knowledgeable, there's different functions that can be used to create inodes and then insert them into the cache. `iget_locked()` that's focused on here is a particular pattern of doing it, but not every FS uses that for one reason or another (or doesn't use it in every situation). Ex: FAT doesn't use it because the inode numbers get made-up and the FS maintains its own mapping of FAT position to inodes. There's then also file systems like `proc` which never cache their inode objects (I'm pretty sure that's the case, I don't claim to understand proc :P )
The inode objects themselves still have the same state flow regardless of where they come from, AFAIK, so from a consumer perspective the usage of the `inode` doesn't change. It's only the creation and internal handling of the inode objects by the FS layer that depends based on what the FS needs.
Does Rust need to change to make it easier to call C?
I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C. (I'm sure someone reading this has done it.) In contrast, in C++ and Objective C, all you need to do is include the right header and call the function. Swift lets you include Objective C files, and you can call C from them.
Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?
extern "C" {
fn abs(input: i32) -> i32;
}
fn main() {
unsafe {
println!("Absolute value of -3 according to C: {}", abs(-3));
}
}
Now, if you have a complex library and don't want to write all of the declarations by hand, you can use a tool like bindgen to automatically generate those extern declarations from a C header file: https://github.com/rust-lang/rust-bindgen
There's an argument to be made that something like bindgen could be included in Rust, not requiring a third party dependency and setting up build.rs to invoke it, but that's not really the issue at hand in this article.
The issue is not the low-level bindings, but higher level wrappers that are more idiomatic in Rust. There's no way you're going to be able to have a general tool that can automatically do that from arbitrary C code.
Passing integers around is easy, sharing structs or strings and context pointers for use in callbacks crossing the language barrier etc is typically much harder.
This is not a notable challenge in rust, nor relevant to the article.
The article is about finding ways of using rust to actually implement kernel fs drivers/etc. Note that any rust code in the kernel is necessarily consuming C interfaces.
Bindgen works quite well for the use case that you are thinking.
Yeah, the Rust proponents are being significantly more ambitious. Not just the ability to code a file system in Rust, but do it in a way that catches a lot of the correctness issues relating to the complex (and changing) semantics of FS development.
It's actually pretty easy. All you need is declare `extern "C" fn foo() -> T` to be able to call it from Rust, and to pass the link flags either by adding a #[link] attribute or by adding it in a build.rs.
You can use the bindgen crate to generate bindings ahead of time, or in a build.rs and include!() the generated bindings.
Normally what people do is create a `-sys` crate that contains only bindings, usually generated. Then their code can `use` the bindings from the sys crate as normal.
> in contrast, in C++ and Objective C, all you need to do is include the right header
The point is that Rust can model invariants that C can't. You can call both ways, but if C is incapable of expressing what Rust can, that has important implications for the design of APIs which must be common to both.
That's not how I interpreted it: There is a clear need to be able to write filesystems in Rust, and the kernel developer(s) who write the filesystem API don't want to have to maintain the bindings to Rust.
> Does Rust need to change to make it easier to call C?
No, because it's already dirt-simple to do. You just declare the C function as 'extern "C"', and then call it. (You will often need to use 'unsafe' and convert or cast references to raw pointers, but that's simple syntax as well.)
There are tools (bindgen being the most used) that can scan C header files and produce the declarations for you, so you don't have to manually copy/paste and type them yourself.
> Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?
I think you maybe misunderstood the article? There's nothing wrong with the language here. The argument is around how Rust should be used. The Rust-for-Linux developers want to encode semantics into their API calls, using Rust's features and type system, to make these calls safer and less error-prone to use. The people on the C side are afraid that doing so will make it harder for them to evolve the behavior and semantics of their C APIs, because then the Rust APIs will need to be updated as well, and they don't want to sign up for that work.
An alternative that might be more palatable is to not make use of Rust features and the type system in order to encode semantics into the Rust API. That way, it will be easier for C developers, as updating Rust API when C API changes will be mechanical and simple to do. But then we might wonder what the point is of all this Rust work if the Rust-for-Linux developers can't use Rust some features to make better, safer APIs.
> I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C.
Kinda weird that you currently have the top-voted comment when you admit you don't understand the language well enough to have an informed opinion on the topic at hand.
I wasn't clear and am not familiar enough with the Linux FS systems to know if this Rust API would be wrapping or re-implementing the C APIs? If it's re-implementing (or rather an additional API) it seems keeping the names the same as the C API would be problematic and lead to more confusion over time, even if initially it helped already-familiar-developers grok whats going on faster.
I'm not familiar with those functions, but I had the impression they actually shouldn't have the same name.
Since the Rust function has implicit/automatic behavior depending on how it's state is and how it's used by the callsite, and since the C one doesn't have any implicit/automatic behavior (as in, separate/explicit lifecycle calls must be made "manually"), I don't even see the reason for them to have the same name.
That is to say, having the same name would be somehow wrong since the functions do and serve for different stuff.
But it would make sense, at least from the Rust site, to have documentation referring to the original C name.
Given how those discussions usually go, and the scale of the change, I find that discussion extraordinarily civil.
I disagree with the negative tone of this thread, I'm quite optimistic given how clearly the parties involved were able to communicate the pain points with zero BS.
I found myself reading this more for the excellent notetaking than for the content.
I suspect the discussion was about as charged, meandering, and nitpicky as we all expect a PL debate among deeply opinionated geeks to be, and Jake Edge (who wrote this summary) is exceptionally good at removing all that and writing down substance.
We are talking about extremely competent people who worked on a critical piece of software for years and invested a lot of their lives in it, with all pain, effort, experience, and responsibilities that come with that.
That this debate is inscribed is a process that is still ongoing, and in fact, progressing, is a testament to how healthy the situation is.
I was expecting the whole Rust thing to be shut down 10 times, in a flow of distasteful remarks, already.
This means that not only Rust is vindicated as promising for the job, but both teams are willing and up to the task of working on the integration.
Those projects are exhausting, highly under-pressure situations, and they last a long time.
I still find that the report is showing a positive outcome. What do people expect? Move fast and break things?
Having more options available in the Linux kernel is always beneficial. However, Rust may not be the solution for everything. While Rust does its best to ensure its programming model is safe, it is still a limited model. Memory issues? Use Rust! Concurrency problems? Switch to Rust! But you can't do everything that C does without using unsafe blocks. Rust can offer a fresh perspective to these problems, but it's not a complete solution.
> But you can't do everything that C does without using unsafe blocks
For this particular work the huge benefit of Rust is its enthusiasm for encapsulating such safety problems in types. Which is indeed what this article is about.
C and particularly the way C is used in the kernel makes it everybody's responsibility to have total knowledge of the tacit rules. That cannot scale. A room full of kernel developers didn't entirely agree on the rules for a data structure they all use!
Rust is very good at making you aware of rules you need to know, and making it not your problem when it can be somebody else's problem to ensure rules are followed. Sometimes the result will be less optimal, but even in the Linux kernel sub-optimal is often the right default and we can provide an (unsafe) escape hatch for people who can afford to learn six more weird rules to maybe get better performance.
Note that unsafe blocks don't have limited blast radius. Blast that can be caused by a single incorrect unsafe block is unlimited, at least in theory. (In practice there could be correlation of amount of incorrectness to effect, but same also could be said about C undefined behavior.)
Unsafe blocks limit amount you need to get correct, but you need to get all of them correct. It is not a blast limiter.
I’d word that different- it reduces the search space for a bug when something goes wrong but it doesn’t limit the blast radius - you can still spectacularly blow up safe rust code with an unsafe block (that no aliases rule is seriously tough to adhere to!)
> But you can't do everything that C does without using unsafe blocks. Rust can offer a fresh perspective to these problems, but it's not a complete solution.
It's true that you need to have unsafe code to do low level things. But it's a misconception that if you have to use unsafe then Rust isn't a good fit. The point of the safe/unsafe dichotomy in Rust is to clearly mark which bits of the code are unsafe, so that you can focus all your attention on auditing those small pieces and have confidence that everything else will work if you get those bits right.
I have to admit, while I do enjoy rust in the sense that it makes sense and can really "click" sometimes. For anything asynchronous I find it really rough around the edges. It's not intuitive what's happening under the hood.
Rust async isn't all that pleasant to use. On the other hand for normal threaded concurrency Rust is one of the best languages around. The type system prevents a lot of concurrency bugs. "Effortless concurrency" is a tagline the language really has earned.
I really hate async rust. It's really great that rust forces you on a compiler level to use mutexes but async is a disease that is spreading through your whole project and introduces a lot of complexity that I don't feel in C#, Python or JS/TS.
If the lifecycle of inodes is filesystem-specific, it should be managed via filesystem-specific functions.
I had the same question. They're trying to understand (or even document) all the C APIs in order to do the rust work. It sounds like collecting all that information might lead to some [WTFs and] refactoring so questions like this don't come up in the first place, and that would be a good thing.
It's an overview of the VFS layer, which is how they do all the filesystem-specific stuff while maintaining a consistent interface from the kernel.
I'm not an expert by any means but I'm somewhat knowledgeable, there's different functions that can be used to create inodes and then insert them into the cache. `iget_locked()` that's focused on here is a particular pattern of doing it, but not every FS uses that for one reason or another (or doesn't use it in every situation). Ex: FAT doesn't use it because the inode numbers get made-up and the FS maintains its own mapping of FAT position to inodes. There's then also file systems like `proc` which never cache their inode objects (I'm pretty sure that's the case, I don't claim to understand proc :P )
The inode objects themselves still have the same state flow regardless of where they come from, AFAIK, so from a consumer perspective the usage of the `inode` doesn't change. It's only the creation and internal handling of the inode objects by the FS layer that depends based on what the FS needs.
Deleted Comment
Does Rust need to change to make it easier to call C?
I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C. (I'm sure someone reading this has done it.) In contrast, in C++ and Objective C, all you need to do is include the right header and call the function. Swift lets you include Objective C files, and you can call C from them.
Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?
There's an argument to be made that something like bindgen could be included in Rust, not requiring a third party dependency and setting up build.rs to invoke it, but that's not really the issue at hand in this article.
The issue is not the low-level bindings, but higher level wrappers that are more idiomatic in Rust. There's no way you're going to be able to have a general tool that can automatically do that from arbitrary C code.
The article is about finding ways of using rust to actually implement kernel fs drivers/etc. Note that any rust code in the kernel is necessarily consuming C interfaces.
Bindgen works quite well for the use case that you are thinking.
https://github.com/rust-lang/rust-bindgen
You can use the bindgen crate to generate bindings ahead of time, or in a build.rs and include!() the generated bindings.
Normally what people do is create a `-sys` crate that contains only bindings, usually generated. Then their code can `use` the bindings from the sys crate as normal.
> in contrast, in C++ and Objective C, all you need to do is include the right header
and link against the library.
No, because it's already dirt-simple to do. You just declare the C function as 'extern "C"', and then call it. (You will often need to use 'unsafe' and convert or cast references to raw pointers, but that's simple syntax as well.)
There are tools (bindgen being the most used) that can scan C header files and produce the declarations for you, so you don't have to manually copy/paste and type them yourself.
> Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?
I think you maybe misunderstood the article? There's nothing wrong with the language here. The argument is around how Rust should be used. The Rust-for-Linux developers want to encode semantics into their API calls, using Rust's features and type system, to make these calls safer and less error-prone to use. The people on the C side are afraid that doing so will make it harder for them to evolve the behavior and semantics of their C APIs, because then the Rust APIs will need to be updated as well, and they don't want to sign up for that work.
An alternative that might be more palatable is to not make use of Rust features and the type system in order to encode semantics into the Rust API. That way, it will be easier for C developers, as updating Rust API when C API changes will be mechanical and simple to do. But then we might wonder what the point is of all this Rust work if the Rust-for-Linux developers can't use Rust some features to make better, safer APIs.
> I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C.
Kinda weird that you currently have the top-voted comment when you admit you don't understand the language well enough to have an informed opinion on the topic at hand.
It wasn’t completely straightforward, but on the whole I figured out everything I needed to within a few days in order to be able to do it.
Calling C would surely be very similar.
https://github.com/tree-sitter/tree-sitter/blob/25c718918084...
Seems like the answer is that it's reimplementing and doesn't use the same names.
Since the Rust function has implicit/automatic behavior depending on how it's state is and how it's used by the callsite, and since the C one doesn't have any implicit/automatic behavior (as in, separate/explicit lifecycle calls must be made "manually"), I don't even see the reason for them to have the same name.
That is to say, having the same name would be somehow wrong since the functions do and serve for different stuff.
But it would make sense, at least from the Rust site, to have documentation referring to the original C name.
I disagree with the negative tone of this thread, I'm quite optimistic given how clearly the parties involved were able to communicate the pain points with zero BS.
I suspect the discussion was about as charged, meandering, and nitpicky as we all expect a PL debate among deeply opinionated geeks to be, and Jake Edge (who wrote this summary) is exceptionally good at removing all that and writing down substance.
We are talking about extremely competent people who worked on a critical piece of software for years and invested a lot of their lives in it, with all pain, effort, experience, and responsibilities that come with that.
That this debate is inscribed is a process that is still ongoing, and in fact, progressing, is a testament to how healthy the situation is.
I was expecting the whole Rust thing to be shut down 10 times, in a flow of distasteful remarks, already.
This means that not only Rust is vindicated as promising for the job, but both teams are willing and up to the task of working on the integration.
Those projects are exhausting, highly under-pressure situations, and they last a long time.
I still find that the report is showing a positive outcome. What do people expect? Move fast and break things?
A barrage of "no" is how it's supposed to go.
Deleted Comment
Deleted Comment
Imagine getting this comment about the open source project you contribute to:
"Science advances one funeral at a time"
Deleted Comment
For this particular work the huge benefit of Rust is its enthusiasm for encapsulating such safety problems in types. Which is indeed what this article is about.
C and particularly the way C is used in the kernel makes it everybody's responsibility to have total knowledge of the tacit rules. That cannot scale. A room full of kernel developers didn't entirely agree on the rules for a data structure they all use!
Rust is very good at making you aware of rules you need to know, and making it not your problem when it can be somebody else's problem to ensure rules are followed. Sometimes the result will be less optimal, but even in the Linux kernel sub-optimal is often the right default and we can provide an (unsafe) escape hatch for people who can afford to learn six more weird rules to maybe get better performance.
lol... you're talking about the linux kernel, written in C.
The vast majority of software over many decades "bottoms out" in C whether in VMs, operating systems, device drivers, etc.
The scale of the success of C is unparalleled.
Using an unsafe block with a very limited blast radius doesn't negate all the guarantees you get in all the rest of your code.
Unsafe blocks limit amount you need to get correct, but you need to get all of them correct. It is not a blast limiter.
This is definitely a strong benefit though.
Deleted Comment
It's true that you need to have unsafe code to do low level things. But it's a misconception that if you have to use unsafe then Rust isn't a good fit. The point of the safe/unsafe dichotomy in Rust is to clearly mark which bits of the code are unsafe, so that you can focus all your attention on auditing those small pieces and have confidence that everything else will work if you get those bits right.
How much of this is actually 100% unambiguously necessary? Is there a good reason why anything in the filesystem code at all needs to be unsafe?
I suspect it's a very small subset needed in a few places.
I have to admit, while I do enjoy rust in the sense that it makes sense and can really "click" sometimes. For anything asynchronous I find it really rough around the edges. It's not intuitive what's happening under the hood.
One of the major wins of Rust is encoding thread safety in the type system with the `Send` and `Sync` traits.
Deleted Comment
Dead Comment