In Rust, dereferencing a pointer is unsafe and so must be isolated within an `unsafe` block — but creating a dangling pointer is not unsafe.
What this means is that as soon as you add `unsafe` code which accepts a pointer argument, all the supposedly "safe" code which can affect this pointer argument becomes a potential source of memory errors.
To me, this was unintuitive — I expected the `unsafe` block to require an extra level of scrutiny, not the surrounding code. After making an error similar to the one described in this post, I wondered, "why isn't creating a dangling pointer unsafe?" But after following that train of thought I realized that vast amounts of code would be pulled into `unsafe` blocks as a consequence, so it's not a viable approach — and thus it became apparent why only dereferencing is unsafe.
The lesson seems to be that dereferencing requires extreme care, but I wish potential errors weren't so subtle. The `as_ptr` family of functions has been tripping people up for a long time:
If `unsafe` were viral it would be useless. Syntactically ill-typed code can still be proven semantically well-typed, and `unsafe` is intended as a marker that the proof of semantic soundness lies outside of Rust's type system. If there is no proof (at least an informal one), you shouldn't be using unsafe, generally speaking.
This seems like a positive development, but there are other `as_ptr` and `as_mut_ptr` functions. The one that I tripped up with was actually from Vec, not CString.
Zooming out, there are innumerable ways to create a dangling pointer. This is really a vexing problem.
Rust doesn't intend to prevent you from creating dangling pointers... if you want that functionality, use references. The reason this issue is particularly likely to hit is that it's the intersection of three things: (1) one of the very few times when people often need raw pointers when not writing very carefully inspected unsafe code is when calling functions across an FFI boundary, (2) C strings are represented by a char * pointer using a type (3) a Rust value that's created as a temporary will drop on the same line. (1) and (2) are how people can know it's almost always a bug when someone does this with a `CString`, whereas a lint would probably have a lot more false positives for something like a `Vec` (which is rarely passed to C directly since it doesn't understand it).
Keep in mind that even something like borrowing a RefCell creates a temporary, so once you cast to a pointer and end the lifetime it came from it's very hard for the type system to track back the pointer you got to any particular deallocated temporary in an intelligent way. It pretty much has to be done on a case by case basis, I suspect (but maybe that could be improved--it is definitely the case, from a study someone did recently, that a large percentage of UB in unsafe Rust is due to destructors running early unexpectedly!).
> Why am I using my own ffi version of chown instead of the libc crate? The libc crate prototypes chown with unsigned uid_t and gid_t types, and I want to pass -1 because I'm not interested in changing the group. I'll spare you the long rant about how one should never redeclare system interfaces if you can't take the time to do so properly, because it turns out if you dig into it, uid_t boils down to uint32_t, but even so, the manual for chown says -1 should work and I want it to work.
First thing I'd try would be to pass "~0" instead of "-1". Would that work? (Looks like in Rust that would be "!0". I don't know anything about Rust.)
If your reaction to Rust actually following the prototype of chmod but not accepting conversions signed/unsigned silently is to redefine chmod so you can do it like you’d do in C… that seems a little bullheaded to me. Wrapping a C function is dangerous in a literal and real sense, doing “-1 as u32” is not at all.
> If your reaction to Rust actually following the prototype of chmod but not accepting conversions
Eh... as the article points out, the chown documentation specifically says that you are expected to provide negative numbers. The spec is obviously bugged.
This limits the amount of righteousness you can claim over "but we followed the spec (that we knew was wrong)!"
Yes. Specifically, POSIX does not specify whether id_t, uid_t, gid_t, or pid_t are signed or unsigned, so whenever it refers to the "default"/"leave alone"/"unspecified" value, it always does it as a cast of -1.
I actually find ~0 more intuitive. u32::MAX makes me think the relevant concept is "a really big number"; ~0 makes me think the relevant concept is "a bunch of bits that are all 1s".
All modern computers use twos-complement and modern languages like Rust require it. Not assuming twos-complement is a theoretical nitpick along the lines of not assuming a byte is 8 bits. Nobody does that.
From my perspective as someone who knows Rust but primarily works with high-level languages like JavaScript, some developers who from a background of unsafe languages (particularly C and C++) seem incredibly cavalier around `unsafe` blocks in Rust code. It's like they're desensitised to the unsafety.
As someone who is used to working in a language where one absolutely cannot hit memory safety issues or undefined behaviour, you can be damn sure that I'm going to read all the relevant documentation and double/triple the invariants of any unsafe code I write in Rust (and maybe even get it reviewed by someone else).
In some ways unsafe blocks in Rust are more unsafe than the equivalent C or C++ code because you have to uphold Rust's more stringent safety guarantees (e.g. no aliasing of mutable references), but they come with the saving grace that you don't need them very often, so you can afford to really take your time and implement them thoroughly.
Maybe that's all true, but in the specific example in question, he just does a string conversion and calls into a C library via ffi (for a reason he has justified). I think he has a real point about how the compiler should find this error if Rust is to fully match its marketing, so to speak. Clippy (a tool for style issues as much as anything else) doesn't seem to be an appropriate place for such error checking.
On balance Rust does a great job, but "be even more careful in unsafe Rust than in regular C++" is probably a losing battle.
> From my perspective as someone who knows Rust but primarily works with high-level languages like JavaScript, some developers who from a background of unsafe languages (particularly C and C++) seem incredibly cavalier around `unsafe` blocks in Rust code. It's like they're desensitised to the unsafety.
That's true. If you are used to 100% of your code being within the equivalent of a Rust "unsafe" block, having over 90% of your code being verified as safe by the compiler is a luxury. Even if one in ten lines are marked as "unsafe", that's already many times less "unsafe" than what they are used to in their C or C++ code.
Moreover, C and C++ developers are used to all kinds of bizarre memory-saving and cycle-counting code that require "unsafe" in Rust. Things like intrusive double-linked circular linked lists, stashing things in the lower bits of pointers (I mean, it's a pointer to a 32-bit value which will always be 32-bit aligned, the lowest two bits will always be zero, why not use them?), doing XOR on pointers, and plenty of other useful tricks.
> Even if one in ten lines are marked as "unsafe", that's already many times less "unsafe" than what they are used to in their C or C++ code.
As a Rust programmer who's written safe code and some unsafe abstractions, one in ten lines being unsafe means that the remaining 90% of your lines need to be written with the same care as your unsafe code, to avoid feeding invalid arguments into unsafe operations which cause UB. The general approach is to not scatter unsafe code around your entire program, but segregate it into safe abstractions that prevent callers from triggering UB. (Though some APIs make it difficult to achieve, so the alternative is to create unsafe abstractions that expect callers to take care to avoid triggering UB, which requires auditing the callers for security as well.)
I'm somewhat wishing for a `.bind!()` macro or something similar, which will desugar
`let y = x.method().bind!().reference_and_do_something()` into `let tmp = x.method(); let y = tmp.reference_and_do_something()` . This should be available for chaining for convenience.
But I see the issues (e.g. which scope should `tmp` be bound to/what is its lifetime?)
It's freed for the same reason it would be freed in C++.
Like C++, Rust relies heavily on RAII, in which resources are freed once they get out of scope. The "CString::new(filename.as_str()).unwrap()" returns a CString, which owns the memory for the C string. Since it's a temporary (it's not being stored anywhere), its scope ends before the next line, so the resources it owns are freed (by calling its Drop implementation) just after the ".as_ptr()" call.
One solution would be to split it into two lines:
let path = CString::new(filename.as_str()).unwrap();
let path = path.as_ptr();
That way, the CString would only release its resources at the end of the block.
With references, the borrow checker doesn't let you do it the wrong way; however, the borrow checker only applies to references, not to raw pointers (which is what .as_ptr() returns).
So shouldn’t .as_ptr() itself be marked unsafe? That way only the correct version of this code, with the .as_ptr() inside the unsafe block, would compile, right?
(I assume there’s a good reason why this isn’t the case, but I don’t know Rust well enough to know the answer. I’d be interested to learn, if anyone can explain it for me!)
Edit to clarify: what seems very strange to me is that a function that not only does something potentially unsafe, but returns an unsafe value would be callable outside of an “unsafe” block.
I mean, yeah, at the end you only get a raw pointer, the object managing the pointer doesn't exist anymore. I'm not a Rust programmer, but I would imagine it's equivalent to the following C++ code
auto ptr = std::string(something).c_str();
This creates a std::string, calls c_str() on it to get the pointer. However, since the std::string isn't stored anywhere, its lifetime stops and it is destructed (this is why these are called temporaries), and the pointer is now invalid. The right way of doing it is this:
std::string str { something };
auto ptr = str.c_str();
Now, str and ptr both live in the same scope, and ptr's lifetime is entirely contained in str's.
to be clear, not path is freed, but what path points to, meaning the result of CString::new(...).unwrap()
Because it is not bound to a variable, it will get dropped before the next line, even if we create a pointer to it. if you would have borrowed instead of creating a pointer this would have been a compiler error because the borrow would have outlived the original object. But a pointer exists outside of the borrow semantics and lifetimes, that is also why it can only be used inside the unsafe block.
It's freed for the same reason it could be freed in Java. In Java it could be freed immediately after as_ptr() returns if that was the last reference, but may be freed later. In Rust it would be freed immediately after as_ptr() returns because that was the last reference.
Fun fact: I hit the exact same bug when working on FamiTracker in C++, with a MFC type also called CString... "construct a CString, acquire a pointer into the internals, destroy the CString". The only reason it doesn't make FamiTracker crash is because MFC's CString is COW/refcounted, and the refcount never reached zero because it was constructed from a long-lasting string literal, so the underlying buffer was still valid after destroying the string object.
I feel that tying deallocations to destructors is a benefit to safe code, and a hindrance to unsafe code. I feel explicit calls to "destructor" functions that move self (perhaps using defer), plus linear typing to statically ensure you don't forget to call a destructor or call one twice, can provide safety in both situations, but at the expense of verbosity and not interacting well with exceptions. I still believe linear typing (argued against by Gankra in https://gankra.github.io/blah/linear-rust/) will fix multiple deficiencies in Rust:
- It helps write allocation-free code which passes objects around without creating/destroying them (maybe an Alloc effect would help too).
- Making drops explicit makes it harder to screw up when writing unsafe code performing allocations/deallocations. It may or may not help protect against double-frees that occur upon panicking.
- File's Drop implementation ignores the return code of fclose(), since you can't return values from a destructor. So dropping a File directly should be prohibited (except during panic-unwinding, which I feel is a rare scenario where perfect error-checking is less crucial than not risking more panics). Instead you move the file into a close() method/function, and are expected to check the Result<(), error> that comes out.
- From what I've read, trying to perform an asynchronous cancellation operation in the standard Drop method is not possible, since the method lacks access to the executor (unless you hard-code an executor to block on). It would make sense to make explicit dropping illegal (as a lint) and require moving it into a "drop"-like function, but keep Drop around to be called during panic-unwinding. This may result in logically incorrect behavior, but is still memory-safe. And I think it's better than making your async-fn state machines always larger to fit awaiting-a-drop states, and awaiting the async runtime when you panic-unwind.
> File's Drop implementation ignores the return code of fclose()
Yeah, and it bothers me that I can't call `close` and check for errors. All you need is something like this:
pub fn close(self) -> io::Result<()> {
let status = unsafe { libc::close(self.fd) };
mem::forget(self); // prevent drop and invalid double-close
match status {
0 => Ok(()),
_ => Err(io::Error::new(ErrorKind::Other)),
}
}
Note that this function takes `self` not `&mut self`, so it consumes the resource and prevents further use. Then `mem::forget` prevents `drop` from running.
Read the blog, found it somewhat interesting. Prohibiting destructors and only allowing consuming methods is certainly easier when you don't have an exception system.
Bidirectional pointers is definitely something it does better than Rust. Constraint references are an interesting idea. Clasps remind me of some Qt objects automatically detaching other objects listening to them, when destroyed (eg. QSortFilterProxyModel's source model, referenced by QSortFilterProxyModel, referenced by QAbstractItemView), but it feels ad-hoc for every individual type, and I don't know if there's underlying principles that all Qt types follow.
The descriptions of the language and others feel a bit crank-ish at times, I got lost a web of hyperlinks to various pages on the site and didn't read all of them, and apparently the language is not ready yet. I might check it out again when it's more complete.
What this means is that as soon as you add `unsafe` code which accepts a pointer argument, all the supposedly "safe" code which can affect this pointer argument becomes a potential source of memory errors.
To me, this was unintuitive — I expected the `unsafe` block to require an extra level of scrutiny, not the surrounding code. After making an error similar to the one described in this post, I wondered, "why isn't creating a dangling pointer unsafe?" But after following that train of thought I realized that vast amounts of code would be pulled into `unsafe` blocks as a consequence, so it's not a viable approach — and thus it became apparent why only dereferencing is unsafe.
The lesson seems to be that dereferencing requires extreme care, but I wish potential errors weren't so subtle. The `as_ptr` family of functions has been tripping people up for a long time:
https://users.rust-lang.org/t/cstring-as-ptr-is-incredibly-u...
`1234 as *const u8` and `ptr.offset(n)` are also safe, because Rust allows invalid/dangling pointers to exist if they're never dereferenced.
Relevant rustc (merged) PR: https://github.com/rust-lang/rust/pull/75671 - "Uplift temporary-cstring-as-ptr lint from clippy into rustc"
Zooming out, there are innumerable ways to create a dangling pointer. This is really a vexing problem.
Keep in mind that even something like borrowing a RefCell creates a temporary, so once you cast to a pointer and end the lifetime it came from it's very hard for the type system to track back the pointer you got to any particular deallocated temporary in an intelligent way. It pretty much has to be done on a case by case basis, I suspect (but maybe that could be improved--it is definitely the case, from a study someone did recently, that a large percentage of UB in unsafe Rust is due to destructors running early unexpectedly!).
First thing I'd try would be to pass "~0" instead of "-1". Would that work? (Looks like in Rust that would be "!0". I don't know anything about Rust.)
Eh... as the article points out, the chown documentation specifically says that you are expected to provide negative numbers. The spec is obviously bugged.
This limits the amount of righteousness you can claim over "but we followed the spec (that we knew was wrong)!"
I can't be bothered to try it myself but I'm fairly sure the author could just have written (-1i32) as uid_t and gotten some of their life back.
As someone who is used to working in a language where one absolutely cannot hit memory safety issues or undefined behaviour, you can be damn sure that I'm going to read all the relevant documentation and double/triple the invariants of any unsafe code I write in Rust (and maybe even get it reviewed by someone else).
In some ways unsafe blocks in Rust are more unsafe than the equivalent C or C++ code because you have to uphold Rust's more stringent safety guarantees (e.g. no aliasing of mutable references), but they come with the saving grace that you don't need them very often, so you can afford to really take your time and implement them thoroughly.
On balance Rust does a great job, but "be even more careful in unsafe Rust than in regular C++" is probably a losing battle.
That's true. If you are used to 100% of your code being within the equivalent of a Rust "unsafe" block, having over 90% of your code being verified as safe by the compiler is a luxury. Even if one in ten lines are marked as "unsafe", that's already many times less "unsafe" than what they are used to in their C or C++ code.
Moreover, C and C++ developers are used to all kinds of bizarre memory-saving and cycle-counting code that require "unsafe" in Rust. Things like intrusive double-linked circular linked lists, stashing things in the lower bits of pointers (I mean, it's a pointer to a 32-bit value which will always be 32-bit aligned, the lowest two bits will always be zero, why not use them?), doing XOR on pointers, and plenty of other useful tricks.
As a Rust programmer who's written safe code and some unsafe abstractions, one in ten lines being unsafe means that the remaining 90% of your lines need to be written with the same care as your unsafe code, to avoid feeding invalid arguments into unsafe operations which cause UB. The general approach is to not scatter unsafe code around your entire program, but segregate it into safe abstractions that prevent callers from triggering UB. (Though some APIs make it difficult to achieve, so the alternative is to create unsafe abstractions that expect callers to take care to avoid triggering UB, which requires auditing the callers for security as well.)
Deleted Comment
But I see the issues (e.g. which scope should `tmp` be bound to/what is its lifetime?)
> let path = CString::new(filename.as_str()).unwrap().as_ptr();
Is one of those functions calling free behind the scenes?
Like C++, Rust relies heavily on RAII, in which resources are freed once they get out of scope. The "CString::new(filename.as_str()).unwrap()" returns a CString, which owns the memory for the C string. Since it's a temporary (it's not being stored anywhere), its scope ends before the next line, so the resources it owns are freed (by calling its Drop implementation) just after the ".as_ptr()" call.
One solution would be to split it into two lines:
That way, the CString would only release its resources at the end of the block.With references, the borrow checker doesn't let you do it the wrong way; however, the borrow checker only applies to references, not to raw pointers (which is what .as_ptr() returns).
(I assume there’s a good reason why this isn’t the case, but I don’t know Rust well enough to know the answer. I’d be interested to learn, if anyone can explain it for me!)
Edit to clarify: what seems very strange to me is that a function that not only does something potentially unsafe, but returns an unsafe value would be callable outside of an “unsafe” block.
Because it is not bound to a variable, it will get dropped before the next line, even if we create a pointer to it. if you would have borrowed instead of creating a pointer this would have been a compiler error because the borrow would have outlived the original object. But a pointer exists outside of the borrow semantics and lifetimes, that is also why it can only be used inside the unsafe block.
Deleted Comment
I feel that tying deallocations to destructors is a benefit to safe code, and a hindrance to unsafe code. I feel explicit calls to "destructor" functions that move self (perhaps using defer), plus linear typing to statically ensure you don't forget to call a destructor or call one twice, can provide safety in both situations, but at the expense of verbosity and not interacting well with exceptions. I still believe linear typing (argued against by Gankra in https://gankra.github.io/blah/linear-rust/) will fix multiple deficiencies in Rust:
- It helps write allocation-free code which passes objects around without creating/destroying them (maybe an Alloc effect would help too).
- Making drops explicit makes it harder to screw up when writing unsafe code performing allocations/deallocations. It may or may not help protect against double-frees that occur upon panicking.
- File's Drop implementation ignores the return code of fclose(), since you can't return values from a destructor. So dropping a File directly should be prohibited (except during panic-unwinding, which I feel is a rare scenario where perfect error-checking is less crucial than not risking more panics). Instead you move the file into a close() method/function, and are expected to check the Result<(), error> that comes out.
- From what I've read, trying to perform an asynchronous cancellation operation in the standard Drop method is not possible, since the method lacks access to the executor (unless you hard-code an executor to block on). It would make sense to make explicit dropping illegal (as a lint) and require moving it into a "drop"-like function, but keep Drop around to be called during panic-unwinding. This may result in logically incorrect behavior, but is still memory-safe. And I think it's better than making your async-fn state machines always larger to fit awaiting-a-drop states, and awaiting the async runtime when you panic-unwind.
Yeah, and it bothers me that I can't call `close` and check for errors. All you need is something like this:
Note that this function takes `self` not `&mut self`, so it consumes the resource and prevents further use. Then `mem::forget` prevents `drop` from running.See discussion at https://github.com/rust-lang/rust/issues/59567
Bidirectional pointers is definitely something it does better than Rust. Constraint references are an interesting idea. Clasps remind me of some Qt objects automatically detaching other objects listening to them, when destroyed (eg. QSortFilterProxyModel's source model, referenced by QSortFilterProxyModel, referenced by QAbstractItemView), but it feels ad-hoc for every individual type, and I don't know if there's underlying principles that all Qt types follow.
The descriptions of the language and others feel a bit crank-ish at times, I got lost a web of hyperlinks to various pages on the site and didn't read all of them, and apparently the language is not ready yet. I might check it out again when it's more complete.