I'm reminded of Raymond Chen's many many blogs[1][2][3](there are a lot more) on why TerminateThread is a bad idea. Not surprised at all the same is true elsewhere. I will say in my own code this is why I tend to prefer cancellable system calls that are alertable. That way the thread can wake up, check if it needs to die and then GTFO.
One of my more annoying gotchas on Windows is that despite this advice being very reasonable sounding, the runtime itself (I believe it actually happens in the kernel) essentially calls TerminateThread on all child threads before running global destructors and atexit hooks. Good luck following this advice when the kernel actively fights you when it come time to shutdown
So there is a reason that in the C++ spec if a std::thread is still joinable when the destructor is called it calls std::terminate[1]. That reason being exactly this case. If the house is being torn down it's not safe to try to save the curtains[2]. Just let the house get torn down as quickly as possible. If you wanted to save the curtains (e.g. do things on the threads before they exit) you need to do it before the end of main and thus global destructors start getting called.
Global destructors and atexit are called by the C/C++ runtime, Windows has nothing to do with that. The C and C++ specs require that returning from main() has the same effect of ending the process as exit() does, meaning they can’t allow any still-running threads to continue running. Given these constraints, would you prefer the threads to keep running until after global destructors and atexit have run? That would be at least as likely to wreak havoc. No, in C/C++, you need to make sure that other threads are not running anymore before returning from main().
> Well, since thread cancellation is implemented using exceptions, and thread cancellation can happen in arbitrary places
No, thread cancelation cannot happen in arbitrary places. Or doesn't have to.
There are two kinds of cancelation: asynchronous and deferred.
POSIX provides an API to configure this for a thread, dynamically: pthread_setcanceltype.
Furthermore, cancelation can be enabled and disabled also.
int pthread_setcancelstate(int state, int *oldstate); // PTHREAD_CANCEL_ENABLE, PTHREAD_CANCEL_DISABLE
int pthread_setcanceltype(int type, int *oldtype); // PTHREAD_CANCEL_DEFERRED, PTHREAD_CANCEL_ASYNCHRONOUS
Needless to say, a thread would only turn on asynchronous cancelation over some code where it is safe to do so, where it won't be caught in the middle of allocating resources, or manipulating data structures that will be in a bad state, and such.
I talk about the cancelability state and how it can help us shortly after that statement: https://mazzo.li/posts/stopping-linux-threads.html#controlle... . In hindsight I should have made a forward reference to that section when talking about C++. My broad point was that combining C++ exceptions and thread cancellation is fraught with danger and imo best avoided.
I regret to be informed that they still haven't figured this out. I was very active in thread on Linux over 20 years ago, working on glibc and whatnot. That was before C++ had threads, needless to say. There was a time when cancelation didn't do C++ unwinding, only the PTHREAD_CLEANUP_PUSH handlers. So of course cancellation and C++ exceptions was a bad cocktail then.
For interrupting long-running syscalls there is another solution:
Install an empty SIGINT signal handler (without SA_RESTART), then run the loop.
When the thread should stop:
* Set stop flag
* Send a SIGINT to the thread, using pthread_kill or tgkill
* Syscalls will fail with EINTR
* check for EINTR & stop flag , then we know we have to clean up and stop
Of course a lot of code will just retry on EINTR, so that requires having control over all the code that does syscalls, which isn't really feasible when using any libraries.
EDIT: The post describes exactly this method, and what the problem with it is, I just missed it.
This article does a nice job of explaining why pthread cancellation is hopeless.
> If we could know that no signal handler is ran between the flag check and the syscall, then we’d be safe.
If you're willing to write assembly, you can accomplish this without rseq. I got it working many years ago on a bunch of platforms. [1] It's similar to what they did in this article: define a "critical region" between the initial flag check and the actual syscall. If the signal happens here, ensure the instruction pointer gets adjusted in such a way that the syscall is bypassed and EINTR returned immediately. But it doesn't need any special kernel support that's Linux-only and didn't exist at the time, just async signal handlers.
(rseq is a very cool facility, btw, just not necessary for this.)
No you can't since the compiler will likely inline the syscall (or vsyscall) in your functions. So there's no way to know the instruction pointer is in the right section. The only way is to pay for no-inline cost and have a wrapper that's calling the syscall, so it's a huge cost to pay for a very rare feature (cancelling a thread abruptly is a no-no in most coding conventions).
I'm afraid that every part of what you just wrote was wrong.
> No you can't since the compiler will likely inline the syscall (or vsyscall) in your functions.
Do you mean the SYSCALL instruction? The standard practice is to make syscalls through glibc's wrappers. The compiler can't inline stuff across a shared library boundary because it doesn't know what version of the shared library will be requested at runtime. Using alternate non-inlineable wrappers (with some extra EINTR magic) does not newly impose the cost of out-of-lined functions.
It'd be possible to allow this instruction to be inlined into your binary's code (rather than using glibc shared library calls), but basically no one does, because this cost is insignificant compared to the context switch.
In general, inlining can be a big performance win, but mostly not because of the actual cost of the function call itself. It's more that sometimes huge optimizations are possible when the caller and callee are considered together. But these syscall wrappers don't have a lot of expense for the compiler to optimize away.
Do you mean the actual syscall (kernel code)? This is a different binary across a protection boundary; even more reason it can't be inlined.
vsyscall (or its modern equivalent, vDSO) is not relevant here. That's only for certain calls such as `gettimeofday` that do not block and so never return EINTR and (in pthread cancellation terms) are not "cancellation points". There is just no reason to do this for them. And again, the compiler can't inline it, because it doesn't know what code the kernel will supply at runtime.
> The only way is to pay for no-inline cost and have a wrapper that's calling the syscall, so it's a huge cost to pay for a very rare feature (cancelling a thread abruptly is a no-no in most coding conventions).
It's an insignificant cost that you're already paying.
The article is proposing a much safer alternative to cancelling a thread abruptly: using altered syscall wrappers that ensure EINTR is returned if a signal arrives before (even immediately before) entering kernel space. That's the same thing my sigsafe library does.
If you can swing it (don't need to block on IO indefinitely), I'd suggest just the simple coordination model.
* Some atomic bool controls if the thread should stop or not;
* The thread doesn't make any unbounded wait syscalls;
* And the thread uses pthread_cond_wait (or equivalent C++ std wrappers) in place of sleeping while idle.
To kill the thread, set the stop flag and cond_signal the condvar. (Under the hood on Linux, this uses futex.)
The tricky part is really point 2 there, that can be harder than it looks (e.g. even simple file I/O can be network drives). Async IO can really shine here, though it’s not exactly trivial designing async cancelletion either.
Relying heavily on a check for an atomic bool is prone to race conditions. I think it's cleaner to structure the event loop as a message queue and have a queued message that indicates it's time to stop.
Queuing a stop means you have to process the queue before stopping. Which certainly is stopping cleanly, but if you wanted to stop the thread because its queue was too long and the work requests were stale, it doesn't help much.
You could maybe allow a queue skipping feature to be used for stop messages... But if it's only for stop messages, set an atomic bool stop, then send a stop message. If the thread just misses the stop bool and waits for messages, you'll get the stop message; if the queue is large, you'll get the stop bool.
disagree. i think then it's too tempting down the line for someone to add a message with blocking processing.
a simple clear loop that looks for a requested stop flag with a confirmed stop flag works pretty well. this can be built into a synchronous "stop" function for the caller that sets the flag and then does a timed wait on the confirmation (using condition variables and pthread_cond_timedwait or waitforxxxobject if you're on windows).
libcurl dealt with this a few months ago, and the sentiment is about the same: thread cancellation in glibc is hairy. The short summary (which I think is accurate) is that an hostname query via libnss ultimately had to read a config file, and glibc's `open` is a thead cancellation point, so if it's canceled, it'll won't free memory that was allocated before the `open`.
Note that the situation with libcurl is very specific: lookup with libnss is only available as a synchronous call. All other syscalls they make can be done with async APIs, which can easily be cancelled without any of the trickery discussed here.
This was a fun read, I didn't know about rseq until today! And before this I reasonably assumed that the naive busy-wait thing would typically be what you'd do in a thread in most circumstances. Or that at least most threads do loop in that manner. I knew that signals and such were a problem but I didn't think just wanting to stop a thread would be so hard! :)
IIRC rseq was originally proposed by Google to support their pure-userspace read-copy-update (RCU) implementation, which relied on per-CPU not per-thread data.
> One can either preemptively or cooperatively schedule threads, and one can also either preemptively or cooperatively cancel processes, but one can only cooperatively cancel threads.
[1] https://devblogs.microsoft.com/oldnewthing/20150814-00/?p=91...
[2] https://devblogs.microsoft.com/oldnewthing/20191101-00/?p=10...
[3] https://devblogs.microsoft.com/oldnewthing/20140808-00/?p=29...
there are a lot more, I'm not linking them all here.
[1] https://en.cppreference.com/w/cpp/thread/thread/~thread.html
[2] https://devblogs.microsoft.com/oldnewthing/20120105-00/?p=86...
No, thread cancelation cannot happen in arbitrary places. Or doesn't have to.
There are two kinds of cancelation: asynchronous and deferred.
POSIX provides an API to configure this for a thread, dynamically: pthread_setcanceltype.
Furthermore, cancelation can be enabled and disabled also.
Needless to say, a thread would only turn on asynchronous cancelation over some code where it is safe to do so, where it won't be caught in the middle of allocating resources, or manipulating data structures that will be in a bad state, and such.Install an empty SIGINT signal handler (without SA_RESTART), then run the loop.
When the thread should stop:
* Set stop flag
* Send a SIGINT to the thread, using pthread_kill or tgkill
* Syscalls will fail with EINTR
* check for EINTR & stop flag , then we know we have to clean up and stop
Of course a lot of code will just retry on EINTR, so that requires having control over all the code that does syscalls, which isn't really feasible when using any libraries.
EDIT: The post describes exactly this method, and what the problem with it is, I just missed it.
> If we could know that no signal handler is ran between the flag check and the syscall, then we’d be safe.
If you're willing to write assembly, you can accomplish this without rseq. I got it working many years ago on a bunch of platforms. [1] It's similar to what they did in this article: define a "critical region" between the initial flag check and the actual syscall. If the signal happens here, ensure the instruction pointer gets adjusted in such a way that the syscall is bypassed and EINTR returned immediately. But it doesn't need any special kernel support that's Linux-only and didn't exist at the time, just async signal handlers.
(rseq is a very cool facility, btw, just not necessary for this.)
[1] Here's the Linux/x86_64 syscall wrapper: https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-... and the signal handler: https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-...
> No you can't since the compiler will likely inline the syscall (or vsyscall) in your functions.
Do you mean the SYSCALL instruction? The standard practice is to make syscalls through glibc's wrappers. The compiler can't inline stuff across a shared library boundary because it doesn't know what version of the shared library will be requested at runtime. Using alternate non-inlineable wrappers (with some extra EINTR magic) does not newly impose the cost of out-of-lined functions.
It'd be possible to allow this instruction to be inlined into your binary's code (rather than using glibc shared library calls), but basically no one does, because this cost is insignificant compared to the context switch.
In general, inlining can be a big performance win, but mostly not because of the actual cost of the function call itself. It's more that sometimes huge optimizations are possible when the caller and callee are considered together. But these syscall wrappers don't have a lot of expense for the compiler to optimize away.
Do you mean the actual syscall (kernel code)? This is a different binary across a protection boundary; even more reason it can't be inlined.
vsyscall (or its modern equivalent, vDSO) is not relevant here. That's only for certain calls such as `gettimeofday` that do not block and so never return EINTR and (in pthread cancellation terms) are not "cancellation points". There is just no reason to do this for them. And again, the compiler can't inline it, because it doesn't know what code the kernel will supply at runtime.
> The only way is to pay for no-inline cost and have a wrapper that's calling the syscall, so it's a huge cost to pay for a very rare feature (cancelling a thread abruptly is a no-no in most coding conventions).
It's an insignificant cost that you're already paying.
The article is proposing a much safer alternative to cancelling a thread abruptly: using altered syscall wrappers that ensure EINTR is returned if a signal arrives before (even immediately before) entering kernel space. That's the same thing my sigsafe library does.
Deleted Comment
This is a race condition. When you "spin" on a condition variable, the stop flag you check must be guarded by the same mutex you give to cond_wait.
See this article for a thorough explanation:
https://zeux.io/2024/03/23/condvars-atomic/
You could maybe allow a queue skipping feature to be used for stop messages... But if it's only for stop messages, set an atomic bool stop, then send a stop message. If the thread just misses the stop bool and waits for messages, you'll get the stop message; if the queue is large, you'll get the stop bool.
ps, hi
It is not, actually. This extremely simple protocol is race-free.
Deleted Comment
a simple clear loop that looks for a requested stop flag with a confirmed stop flag works pretty well. this can be built into a synchronous "stop" function for the caller that sets the flag and then does a timed wait on the confirmation (using condition variables and pthread_cond_timedwait or waitforxxxobject if you're on windows).
The write-up is on how they're dealing with it starts at https://eissing.org/icing/posts/pthread_cancel/.
Hopefully this improves eventually? Who knows?
I think a good shorthand for this stuff is
> One can either preemptively or cooperatively schedule threads, and one can also either preemptively or cooperatively cancel processes, but one can only cooperatively cancel threads.