This is why separating return values from error codes is important.
For example, in Rust, you’d never get into this situation, because a decent fork ffi function would immediately convert -1 into a Result carrying an error, and properly check errno. Java and C++ would throw an exception, etc.
Thus preventing all sorts of bad behavior up the stack.
I cannot upvote this enough. I've written some rust code which uses fork, and after reading this, I went to check the docs. And lo and behold it follows the convention you put forth. So I was safe.
Yay Rust, where things either fail (panic and quits the program) or returns a result that you can't use until you check for the error.
Fork (without exec) is a very sharp tool because it can violate ownership, doubly so in multi-threaded programs. So just because you get a Result from it doesn't mean all its pitfalls are handled.
System calls actually do return the result seperately from success or failure. It's libc that smooshes them into one thing.
Specifics vary by platform, but for FreeBSD i386, the result usually comes back in register EAX (but docs say sometimes another register), and the success or failure comes back as the carry flag. Of course, C never made access to the carry flag easy, so libc smooshes things together.
System calls can return them separately, but don’t have to. On Linux a failing syscall returns -errno; the syscall wrapper assumes any return value in the range -1 to -4095 is an error, stashes the positive error number in the C errno variable and translates the return value to -1.
All that is certainly true, but we're talking about a function that dates back to the 70s here. It predates all of those nice error handling concepts (exceptions, Result<T>, multiple return values), so it's pretty sensible from that perspective to have -1 be the error value.
We're still dealing with this today as so many of us deploy and develop software for POSIX that exposes us to these edge cases.
So, yes, while working in Rust and using better APIs designed around these interfaces, we won't do the wrong thing, but there's still a lot of people that are going to end up getting cut by these old 70's interfaces. It would be nice if we didn't have to work with these, but many of us still do.
> All that is certainly true, but we're talking about a function that dates back to the 70s here. It predates all of those nice error handling concepts (exceptions, Result<T>, multiple return values), so it's pretty sensible from that perspective to have -1 be the error value.
It doesn't predate creating a struct which tells you exactly whether thing went wrong and how.
Yeah, but in C++ you get people ignoring exceptions too. I'm fine with return codes, I don't mind them even though exceptions are maybe a little bit better.
If by ignoring, you mean non catching anything, I'm often okay with that. If an exception makes it top-level, you'll get a std::terminate and be done.
The "canonical" example people use a lot is std::bad_alloc. Often, there is no point in catching it -- what cleanup or fallback work are you planning to do when you can't even allocate memory?
Of course, silently swallowing exceptions with an empty catch-block is terrible.
True, but in this case, ignoring an exception would be better than ignoring a return code, right? It prevent you from accidentally sending a signal to -1 (e.g. all the processes you can). The danger here with the return code is that the return code value is a valid (but not intended) input to another function that normally uses the normal return value.
A similar "sigil" related bug was the cause of the sudo exploit a few weeks back. Option types are often sold as a way to end the dreaded 'null pointer exception' but I think ending the dreaded "sigil we forgot to handle" benefit may be bigger.
Using integer codes like 0 and -1 were the old style of returning errors. Often combined with some undocumented features.
It is really wrong, but that is caused by the C-functions returning only one integer of data. And the bad C libraries. Null used to be -1 in some compilers. Mixing pointers as integers is a recipe for disaster, not compatible with some CPU-s, but it is C-standard.
In the windows32 API they improved it a bit, with functions that only returned an Error code. They required struct addresses in which results were placed. I think that the process structure is also a lot better than the fork() function.
It's even more fun when you're mixing libraries, and some of them return 1 on success, some of them return zero, and some -1, and you can't tell by inspection which is which.
Bonus points if the API you're using inconsistently mixes boolean, null-pointer, HRESULT, DWORD, and returning a status in a value you pass a pointer to (you did initialize that value, right?)
I don't know Rust well, but my understanding is the Rust way is better than Java or C++.
Exceptions are a mess. Either you go the Java route of tagging everything as throwing every subclass of Exception under the sun (which encourages people to write empty catch blocks to silence a noisy compiler), or you go the C++ route where you are not totally clear an error can occur when writing the code or from glancing at it. (Combine with operator overloading for most confusing results.)
Having an error result that can perhaps be easily propagated to the caller is the best of both worlds, and I think is the thing that good C code tries to approximate in a more manual way.
> If a function be advertised to return an error code in the event of difficulties, thou shalt check for that code, yea, even though the checks triple the size of thy code and produce aches in thy typing fingers, for if thou thinkest ``it cannot happen to me'', the gods shall surely punish thee for thy arrogance.
Posts like this always perplex me, the behavior is clearly documented in the man page and clearly indicated by example demonstration code (e.g. [0]), so how could someone fall under the impression that this wasn't the case?
The point of this blog post isn't so much "hey, fork can fail" but pointing out that "if you fail to handle fork failing, the outcome is really bad." Fork's error result is a legal input to kill, but one which has really nasty semantics. It's also a legal input to wait, but has somewhat more benign semantics.
Disagree - the unfortunate interaction with kill is not clearly documented.
Do you implement error checking when calling printf, unlike every C codebase I've ever encountered which uses it? If not, you've implicitly acknowledged some error cases just aren't worth handling, or useful to handle. The question is then - when is it important?
PSAs like this one make it clear where the documentation may not have: Error handling fork() is important, and unlike error handling malloc where free(nullptr)ing later is a safe noop, kill(-1)ing later is an unsafe hazard that must be avoided. Additionally, it's frequently the case that the documentation is poor and would not help you even when you do bother to read it. Here's me previously ranting that the vast majority of documentation about atoi fail to clearly and adequately call out that atoi("a") is undefined behavior and citing my sources: https://news.ycombinator.com/item?id=14861447
> so how could someone fall under the impression that this wasn't the case?
Continuing past my atoi example...
Maybe they looked at alternative, poorer documentation. Maybe they looked at poor example code that didn't bother with error handling. Maybe they looked at decent documentation that failed to adequately stress the importance of error checking (EDIT: I'd argue this includes your wikipedia example). Hell, maybe they looked at great documentation - about a specific platform's implementation of fork, which perhaps makes fork() failing fatal to the calling process and thus "infalliable". Maybe they looked at the documentation for their favorite language's wrapper of fork, which throws an exception instead.
Maybe they didn't look at the documentation at all.
Maybe they learned of fork through word of mouth when the internet was down on a system without manpages. "Can fork ever fail?" "Hmm... I've never seen it fail." "Good enough for me!"
Perhaps this lack of knowledge can only come about by foolishness - but human nature and statistics mean at least one of your generally smart coworkers has probably fallen prey to such foolishness.
> Do you implement error checking when calling printf, unlike every C codebase I've ever encountered which uses it?
There's a reason that Unix sends EPIPE to a process when writing to a broken pipe, and why the default handler for SIGPIPE is to terminate the process. Interestingly, the Rust runtime blocks SIGPIPE, which was a naive and dumb thing to do [1], but which is now impossible to undo.
Similarly, in C the fail flag for FILE objects is persistent to permit alternative error management strategies. This is even carried over into Go, AFAIU. Basically, it's okay to leave a series of I/O statements unchecked so long as you check at the end of the block or transaction, or at the very least at close time.
The printf case is a bad example because the inconvenience of checking for failure on every call has already been accounted for.
I don't think I've ever seen C code that fails to check fork for an error condition, though I'm usually only ever reading my code and the code of widely used open source projects.
[1] Considering how much Rust touts the ease of FFI and integration with C and C++ projects.
Comments like this always perplex me. Every driver knows to not run red lights, yet people regularly do it. How could someone fall under the impression that people always act with full care and concentration on every task they undertake?
Should cars prevent people from running red lights? There are a few unusual circumstances when this is something you want to do intentionally and knowing the risks.
A post telling people they aren't supposed to run red lights would also be strange, since that is well known. Are you saying that people who don't check fork for errors are doing so knowingly? That is not the impression I got from the post.
To me the takeaway shouldn’t be that fork() can fail, rather that kill(-1, ...) has effects that extend outside of the scope of your process. This is documented too, but it far less intentionally used. (Giving a pid_t an initial value of -1 could be considered a good practice, right up until you hit a code path that fails to check this and you kill your parent!)
A saner design would have been to split that API up, e.g. “killgrp” to kill groups of processes by whatever identifier (current process group, foreign process group, all processes, etc.). This way, your intent is encoded in the function you call, which is much harder to screw up.
I'm having trouble understanding the mindset / reasoning a programmer would use for not checking the return code of a syscall that could fail.
For someone who codes for a living and would be inclined to not check `fork` for an error code, would you mind sharing a bit about why you use that approach?
I don't - the default behavior of simply ignoring I/O failures if, say, stdout's pipe was broken is usually what I want. In fact, I've had bugs in exception throwing languages where such stdout write failures threw, and I failed to explicitly catch and ignore them.
I also may skip error checking malloc. A null pointer exception / sigsegv / access violation "must" be fine if malloc is failing - even if I handle it nonfatally in our code, some of our closed source middleware doesn't, and neither do some system libraries. At best I can make slightly better fatal error messages for a subset of the resulting failures. If I'm trying to build super reliable software, I need to avoid exhausting/fragmenting memory badly enough for malloc to fail in the first place.
I have a decent chance of checking fork() for failure as I'm on the more paranoid end of error checking. I've seen enough weird junk like SetCurrentDirectory on a real directory failing due to NTFS filesystem corruption leading to an infinite loop - that I assume all documented error conditions will eventually occur somehow, as well as some undocumented ones.
But I've never seen it fail, and I'm probably just going to make it a fatal error.
Folks wonder why not every error code is checked. My friend Mike Rowe said it this way: it's like having an altimeter in your car. So if you drive off a cliff, you know how far it is to the ground.
Not only is that a dumb analogy in this context, it's a plain dumb analogy. That's not how an altimeter works. Not only would you need an altimeter, but also fast and accurate GPS positioning, and current topographical maps of your area in order to know how far it is to the ground below that cliff.
> If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init)
Has this behavior with init always been this way? I could swear in the past that `kill -9 -1` used to kill init too (and thus cause a reboot), it was one of my favorite "fuck it and reboot" methods.
Linux will not let you send a signal to init that it has not installed a signal handler for (to avoid unexpected terminate-by-default signals wrecking your machine), and no process can install a signal handler for SIGKILL; ergo, you cannot SIGKILL init.
So do people just not read man pages? It's right there.
On success, the PID of the child process is returned in the parent,
and 0 is returned in the child. On failure, -1 is returned in the
parent, no child process is created, and errno is set appropriately.
When I went through my university's C programming course, the instructor barely knew C. For the students with limited Linux experience, the concept of man pages was unknown to them, let alone that there are man pages for C functions. So no, people don't read man pages because they don't know they exist.
For example, in Rust, you’d never get into this situation, because a decent fork ffi function would immediately convert -1 into a Result carrying an error, and properly check errno. Java and C++ would throw an exception, etc.
Thus preventing all sorts of bad behavior up the stack.
Yay Rust, where things either fail (panic and quits the program) or returns a result that you can't use until you check for the error.
Fork (without exec) is a very sharp tool because it can violate ownership, doubly so in multi-threaded programs. So just because you get a Result from it doesn't mean all its pitfalls are handled.
Specifics vary by platform, but for FreeBSD i386, the result usually comes back in register EAX (but docs say sometimes another register), and the success or failure comes back as the carry flag. Of course, C never made access to the carry flag easy, so libc smooshes things together.
But yes, don't do this today, obviously.
So, yes, while working in Rust and using better APIs designed around these interfaces, we won't do the wrong thing, but there's still a lot of people that are going to end up getting cut by these old 70's interfaces. It would be nice if we didn't have to work with these, but many of us still do.
It doesn't predate creating a struct which tells you exactly whether thing went wrong and how.
The "canonical" example people use a lot is std::bad_alloc. Often, there is no point in catching it -- what cleanup or fallback work are you planning to do when you can't even allocate memory?
Of course, silently swallowing exceptions with an empty catch-block is terrible.
If you ignore or misuse error code, you carry on in a corrupted state.
... and then you can just do `coredumpctl gdb` and start debugging right where things went wrong ?
… because you cannot fork() safely in Rust.
It is really wrong, but that is caused by the C-functions returning only one integer of data. And the bad C libraries. Null used to be -1 in some compilers. Mixing pointers as integers is a recipe for disaster, not compatible with some CPU-s, but it is C-standard.
In the windows32 API they improved it a bit, with functions that only returned an Error code. They required struct addresses in which results were placed. I think that the process structure is also a lot better than the fork() function.
Bonus points if the API you're using inconsistently mixes boolean, null-pointer, HRESULT, DWORD, and returning a status in a value you pass a pointer to (you did initialize that value, right?)
Why do we keep doing this to ourselves?
Exceptions are a mess. Either you go the Java route of tagging everything as throwing every subclass of Exception under the sun (which encourages people to write empty catch blocks to silence a noisy compiler), or you go the C++ route where you are not totally clear an error can occur when writing the code or from glancing at it. (Combine with operator overloading for most confusing results.)
Having an error result that can perhaps be easily propagated to the caller is the best of both worlds, and I think is the thing that good C code tries to approximate in a more manual way.
And do you know why you'd never get into his with rust? Because rust doesn't support fork.
"Separation" in this case implying "hard to misuse", which the POSIX fork() API certainly is not.
> And do you know why you'd never get into his with rust? Because rust doesn't support fork.
Yes, it does. https://docs.rs/nix/0.17.0/nix/unistd/fn.fork.html
http://www.lysator.liu.se/c/ten-commandments.html
[0] https://en.wikipedia.org/wiki/Fork_(system_call)
Disagree - the unfortunate interaction with kill is not clearly documented.
Do you implement error checking when calling printf, unlike every C codebase I've ever encountered which uses it? If not, you've implicitly acknowledged some error cases just aren't worth handling, or useful to handle. The question is then - when is it important?
PSAs like this one make it clear where the documentation may not have: Error handling fork() is important, and unlike error handling malloc where free(nullptr)ing later is a safe noop, kill(-1)ing later is an unsafe hazard that must be avoided. Additionally, it's frequently the case that the documentation is poor and would not help you even when you do bother to read it. Here's me previously ranting that the vast majority of documentation about atoi fail to clearly and adequately call out that atoi("a") is undefined behavior and citing my sources: https://news.ycombinator.com/item?id=14861447
> so how could someone fall under the impression that this wasn't the case?
Continuing past my atoi example...
Maybe they looked at alternative, poorer documentation. Maybe they looked at poor example code that didn't bother with error handling. Maybe they looked at decent documentation that failed to adequately stress the importance of error checking (EDIT: I'd argue this includes your wikipedia example). Hell, maybe they looked at great documentation - about a specific platform's implementation of fork, which perhaps makes fork() failing fatal to the calling process and thus "infalliable". Maybe they looked at the documentation for their favorite language's wrapper of fork, which throws an exception instead.
Maybe they didn't look at the documentation at all.
Maybe they learned of fork through word of mouth when the internet was down on a system without manpages. "Can fork ever fail?" "Hmm... I've never seen it fail." "Good enough for me!"
Perhaps this lack of knowledge can only come about by foolishness - but human nature and statistics mean at least one of your generally smart coworkers has probably fallen prey to such foolishness.
There's a reason that Unix sends EPIPE to a process when writing to a broken pipe, and why the default handler for SIGPIPE is to terminate the process. Interestingly, the Rust runtime blocks SIGPIPE, which was a naive and dumb thing to do [1], but which is now impossible to undo.
Similarly, in C the fail flag for FILE objects is persistent to permit alternative error management strategies. This is even carried over into Go, AFAIU. Basically, it's okay to leave a series of I/O statements unchecked so long as you check at the end of the block or transaction, or at the very least at close time.
The printf case is a bad example because the inconvenience of checking for failure on every call has already been accounted for.
I don't think I've ever seen C code that fails to check fork for an error condition, though I'm usually only ever reading my code and the code of widely used open source projects.
[1] Considering how much Rust touts the ease of FFI and integration with C and C++ projects.
For someone who codes for a living and would be inclined to not check `fork` for an error code, would you mind sharing a bit about why you use that approach?
I don't - the default behavior of simply ignoring I/O failures if, say, stdout's pipe was broken is usually what I want. In fact, I've had bugs in exception throwing languages where such stdout write failures threw, and I failed to explicitly catch and ignore them.
I also may skip error checking malloc. A null pointer exception / sigsegv / access violation "must" be fine if malloc is failing - even if I handle it nonfatally in our code, some of our closed source middleware doesn't, and neither do some system libraries. At best I can make slightly better fatal error messages for a subset of the resulting failures. If I'm trying to build super reliable software, I need to avoid exhausting/fragmenting memory badly enough for malloc to fail in the first place.
I have a decent chance of checking fork() for failure as I'm on the more paranoid end of error checking. I've seen enough weird junk like SetCurrentDirectory on a real directory failing due to NTFS filesystem corruption leading to an infinite loop - that I assume all documented error conditions will eventually occur somehow, as well as some undocumented ones.
But I've never seen it fail, and I'm probably just going to make it a fatal error.
Has this behavior with init always been this way? I could swear in the past that `kill -9 -1` used to kill init too (and thus cause a reboot), it was one of my favorite "fuck it and reboot" methods.
Other operating systems may vary, of course.