Fork() can fail: this is important (2014)

This is why separating return values from error codes is important.

For example, in Rust, you’d never get into this situation, because a decent fork ffi function would immediately convert -1 into a Result carrying an error, and properly check errno. Java and C++ would throw an exception, etc.

Thus preventing all sorts of bad behavior up the stack.

swsieber · 6 years ago

I cannot upvote this enough. I've written some rust code which uses fork, and after reading this, I went to check the docs. And lo and behold it follows the convention you put forth. So I was safe.

Yay Rust, where things either fail (panic and quits the program) or returns a result that you can't use until you check for the error.

the8472 · 6 years ago

> So I was safe.

Fork (without exec) is a very sharp tool because it can violate ownership, doubly so in multi-threaded programs. So just because you get a Result from it doesn't mean all its pitfalls are handled.

toast0 · 6 years ago

System calls actually do return the result seperately from success or failure. It's libc that smooshes them into one thing.

Specifics vary by platform, but for FreeBSD i386, the result usually comes back in register EAX (but docs say sometimes another register), and the success or failure comes back as the carry flag. Of course, C never made access to the carry flag easy, so libc smooshes things together.

nneonneo · 6 years ago

System calls can return them separately, but don’t have to. On Linux a failing syscall returns -errno; the syscall wrapper assumes any return value in the range -1 to -4095 is an error, stashes the positive error number in the C errno variable and translates the return value to -1.

OskarS · 6 years ago

All that is certainly true, but we're talking about a function that dates back to the 70s here. It predates all of those nice error handling concepts (exceptions, Result<T>, multiple return values), so it's pretty sensible from that perspective to have -1 be the error value.

But yes, don't do this today, obviously.

bluejekyll · 6 years ago

We're still dealing with this today as so many of us deploy and develop software for POSIX that exposes us to these edge cases.

So, yes, while working in Rust and using better APIs designed around these interfaces, we won't do the wrong thing, but there's still a lot of people that are going to end up getting cut by these old 70's interfaces. It would be nice if we didn't have to work with these, but many of us still do.

catblast · 6 years ago

I wonder how a rust compiler implementation would fare on a PDP-11/20

masklinn · 6 years ago

> All that is certainly true, but we're talking about a function that dates back to the 70s here. It predates all of those nice error handling concepts (exceptions, Result<T>, multiple return values), so it's pretty sensible from that perspective to have -1 be the error value.

It doesn't predate creating a struct which tells you exactly whether thing went wrong and how.

jacobush · 6 years ago

Yeah, but in C++ you get people ignoring exceptions too. I'm fine with return codes, I don't mind them even though exceptions are maybe a little bit better.

thestoicattack · 6 years ago

If by ignoring, you mean non catching anything, I'm often okay with that. If an exception makes it top-level, you'll get a std::terminate and be done.

The "canonical" example people use a lot is std::bad_alloc. Often, there is no point in catching it -- what cleanup or fallback work are you planning to do when you can't even allocate memory?

Of course, silently swallowing exceptions with an empty catch-block is terrible.

swsieber · 6 years ago

True, but in this case, ignoring an exception would be better than ignoring a return code, right? It prevent you from accidentally sending a signal to -1 (e.g. all the processes you can). The danger here with the return code is that the return code value is a valid (but not intended) input to another function that normally uses the normal return value.

akavi · 6 years ago

Hence why `Result` is really valuable. Can't ignore it (plus, the semantics are straightforward)

masklinn · 6 years ago

If you ignore (as in fail to handle) an exception, your software dies safely, it won't do the wrong thing.

If you ignore or misuse error code, you carry on in a corrupted state.

jcelerier · 6 years ago

> Yeah, but in C++ you get people ignoring exceptions too

... and then you can just do `coredumpctl gdb` and start debugging right where things went wrong ?

gameswithgo · 6 years ago

A similar "sigil" related bug was the cause of the sudo exploit a few weeks back. Option types are often sold as a way to end the dreaded 'null pointer exception' but I think ending the dreaded "sigil we forgot to handle" benefit may be bigger.

jimbob45 · 6 years ago

Wouldn't wrapping potentially dangerous function calls as monads in C++ solve this exact issue?

lima · 6 years ago

Go also cleanly separates results and errors. syscall.ForkExec returns an error if the call failed.

JyB · 6 years ago

Yes it's just a good practice when exposing an API. It's language agnostic.

the_mitsuhiko · 6 years ago

> For example, in Rust, you’d never get into this situation

… because you cannot fork() safely in Rust.

zyxzevn · 6 years ago

Using integer codes like 0 and -1 were the old style of returning errors. Often combined with some undocumented features.

It is really wrong, but that is caused by the C-functions returning only one integer of data. And the bad C libraries. Null used to be -1 in some compilers. Mixing pointers as integers is a recipe for disaster, not compatible with some CPU-s, but it is C-standard.

In the windows32 API they improved it a bit, with functions that only returned an Error code. They required struct addresses in which results were placed. I think that the process structure is also a lot better than the fork() function.

kabdib · 6 years ago

It's even more fun when you're mixing libraries, and some of them return 1 on success, some of them return zero, and some -1, and you can't tell by inspection which is which.

Bonus points if the API you're using inconsistently mixes boolean, null-pointer, HRESULT, DWORD, and returning a status in a value you pass a pointer to (you did initialize that value, right?)

Why do we keep doing this to ourselves?

asveikau · 6 years ago

I don't know Rust well, but my understanding is the Rust way is better than Java or C++.

Exceptions are a mess. Either you go the Java route of tagging everything as throwing every subclass of Exception under the sun (which encourages people to write empty catch blocks to silence a noisy compiler), or you go the C++ route where you are not totally clear an error can occur when writing the code or from glancing at it. (Combine with operator overloading for most confusing results.)

Having an error result that can perhaps be easily propagated to the caller is the best of both worlds, and I think is the thing that good C code tries to approximate in a more manual way.

dmitrygr · 6 years ago

What are you talking about? They are perfectly separated. Top bit of the result IS the separation.

And do you know why you'd never get into his with rust? Because rust doesn't support fork.

pcwalton · 6 years ago

> What are you talking about? They are perfectly separated. Top bit of the result IS the separation.

"Separation" in this case implying "hard to misuse", which the POSIX fork() API certainly is not.

> And do you know why you'd never get into his with rust? Because rust doesn't support fork.

Yes, it does. https://docs.rs/nix/0.17.0/nix/unistd/fn.fork.html

Posts like this always perplex me, the behavior is clearly documented in the man page and clearly indicated by example demonstration code (e.g. [0]), so how could someone fall under the impression that this wasn't the case?

[0] https://en.wikipedia.org/wiki/Fork_(system_call)

jcranmer · 6 years ago

The point of this blog post isn't so much "hey, fork can fail" but pointing out that "if you fail to handle fork failing, the outcome is really bad." Fork's error result is a legal input to kill, but one which has really nasty semantics. It's also a legal input to wait, but has somewhat more benign semantics.

MaulingMonkey · 6 years ago

> the behavior is clearly documented

Disagree - the unfortunate interaction with kill is not clearly documented.

Do you implement error checking when calling printf, unlike every C codebase I've ever encountered which uses it? If not, you've implicitly acknowledged some error cases just aren't worth handling, or useful to handle. The question is then - when is it important?

PSAs like this one make it clear where the documentation may not have: Error handling fork() is important, and unlike error handling malloc where free(nullptr)ing later is a safe noop, kill(-1)ing later is an unsafe hazard that must be avoided. Additionally, it's frequently the case that the documentation is poor and would not help you even when you do bother to read it. Here's me previously ranting that the vast majority of documentation about atoi fail to clearly and adequately call out that atoi("a") is undefined behavior and citing my sources: https://news.ycombinator.com/item?id=14861447

> so how could someone fall under the impression that this wasn't the case?

Continuing past my atoi example...

Maybe they looked at alternative, poorer documentation. Maybe they looked at poor example code that didn't bother with error handling. Maybe they looked at decent documentation that failed to adequately stress the importance of error checking (EDIT: I'd argue this includes your wikipedia example). Hell, maybe they looked at great documentation - about a specific platform's implementation of fork, which perhaps makes fork() failing fatal to the calling process and thus "infalliable". Maybe they looked at the documentation for their favorite language's wrapper of fork, which throws an exception instead.

Maybe they didn't look at the documentation at all.

Maybe they learned of fork through word of mouth when the internet was down on a system without manpages. "Can fork ever fail?" "Hmm... I've never seen it fail." "Good enough for me!"

Perhaps this lack of knowledge can only come about by foolishness - but human nature and statistics mean at least one of your generally smart coworkers has probably fallen prey to such foolishness.

wahern · 6 years ago

> Do you implement error checking when calling printf, unlike every C codebase I've ever encountered which uses it?

There's a reason that Unix sends EPIPE to a process when writing to a broken pipe, and why the default handler for SIGPIPE is to terminate the process. Interestingly, the Rust runtime blocks SIGPIPE, which was a naive and dumb thing to do [1], but which is now impossible to undo.

Similarly, in C the fail flag for FILE objects is persistent to permit alternative error management strategies. This is even carried over into Go, AFAIU. Basically, it's okay to leave a series of I/O statements unchecked so long as you check at the end of the block or transaction, or at the very least at close time.

The printf case is a bad example because the inconvenience of checking for failure on every call has already been accounted for.

I don't think I've ever seen C code that fails to check fork for an error condition, though I'm usually only ever reading my code and the code of widely used open source projects.

[1] Considering how much Rust touts the ease of FFI and integration with C and C++ projects.

noelwelsh · 6 years ago

Comments like this always perplex me. Every driver knows to not run red lights, yet people regularly do it. How could someone fall under the impression that people always act with full care and concentration on every task they undertake?

erobbins · 6 years ago

Should cars prevent people from running red lights? There are a few unusual circumstances when this is something you want to do intentionally and knowing the risks.

dooglius · 6 years ago

A post telling people they aren't supposed to run red lights would also be strange, since that is well known. Are you saying that people who don't check fork for errors are doing so knowingly? That is not the impression I got from the post.

salgernon · 6 years ago

To me the takeaway shouldn’t be that fork() can fail, rather that kill(-1, ...) has effects that extend outside of the scope of your process. This is documented too, but it far less intentionally used. (Giving a pid_t an initial value of -1 could be considered a good practice, right up until you hit a code path that fails to check this and you kill your parent!)

nneonneo · 6 years ago

A saner design would have been to split that API up, e.g. “killgrp” to kill groups of processes by whatever identifier (current process group, foreign process group, all processes, etc.). This way, your intent is encoded in the function you call, which is much harder to screw up.

bachmeier · 6 years ago

Are you claiming you read the full documentation and the corresponding Wikipedia article every time you call a function in your program?

dooglius · 6 years ago

If I didn't remember how to use a function, I would certainly look at either the documentation or an example, yes.

DoofusOfDeath · 6 years ago

I'm having trouble understanding the mindset / reasoning a programmer would use for not checking the return code of a syscall that could fail.

For someone who codes for a living and would be inclined to not check `fork` for an error code, would you mind sharing a bit about why you use that approach?

MaulingMonkey · 6 years ago

Do you error check printf?

I don't - the default behavior of simply ignoring I/O failures if, say, stdout's pipe was broken is usually what I want. In fact, I've had bugs in exception throwing languages where such stdout write failures threw, and I failed to explicitly catch and ignore them.

I also may skip error checking malloc. A null pointer exception / sigsegv / access violation "must" be fine if malloc is failing - even if I handle it nonfatally in our code, some of our closed source middleware doesn't, and neither do some system libraries. At best I can make slightly better fatal error messages for a subset of the resulting failures. If I'm trying to build super reliable software, I need to avoid exhausting/fragmenting memory badly enough for malloc to fail in the first place.

I have a decent chance of checking fork() for failure as I'm on the more paranoid end of error checking. I've seen enough weird junk like SetCurrentDirectory on a real directory failing due to NTFS filesystem corruption leading to an infinite loop - that I assume all documented error conditions will eventually occur somehow, as well as some undocumented ones.

But I've never seen it fail, and I'm probably just going to make it a fatal error.

thestoicattack · 6 years ago

If I didn't know it could fail, I wouldn't check for an error in its return code.

xeromal · 6 years ago

PSAs don't hurt anyone and might help one who happened to miss it.Do you see anything negative with the post?

dooglius · 6 years ago

I don't have a problem with the post, I'm just wondering how one gets into a situation where it needs to be said.

On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately.