> A frequent problem with goroutines in long-running applications is handling panics. A goroutine spawned without a panic handler will crash the whole process on panic. This is usually undesirable.
In go land, this seems desirable. Recoverable errors should be propagated as return values, not as panics.
I heavily used errgroup before creating conc, so the design is likely strongly influenced by that of errgroup even if not consciously. Conc was partially built to address the shortcomings of errgroup (from my perspective). Probably worth adding a "prior art" section to the README, but many of the ideas in conc are not unique.
> In go land, this seems desirable.
I mostly agree, which is why `Wait()` propagates the panic rather than returning it or logging it. This keeps panics scoped to the spawning goroutine and enables getting stacktraces from both the spawning goroutine and the spawned goroutine, which is quite useful for debugging.
That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.
> That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.
The problem is that panics aren't "goroutine scoped" in terms of their potential impact. So it really shouldn't be up to the user to decide how to handle a panic. Application code shouldn't handle panics at all! They're not just a different way to yield an error, they're critical bugs which shouldn't occur at all.
It is the way of things in an imperative language. If you catch a panic, you are also declaring to the runtime that there is nothing dangling, no locks in a bad state, etc. This is often the case. (Although since I don't think this is a well-understood aspect of what catching a panic means, it is arguably only usually true by a certain amount of coincidence.) But if you don't say that to the runtime, it can't assume it safely and terminating the program, while violent, is arguably either the best option or the only correct option.
Other paradigms, like the Erlang paradigm, can have better behaviors even if a top-level evaluation fails. But in an imperative language, there really isn't anything else you should do. It is arguably one of the Clues (in the "cluestick" sense) that the imperative paradigm is perhaps not the one that should be the base of our technology. But that's a well-debated matter.
I don’t think this is really a question of whether your code is imperative, since Haskell code will terminate just as surely as Go code if you try to access an array element out of range.
(Haskell’s lazy evaluation just makes it a bit harder to catch, since you need to force evaluation of the thunk within the catch statement, and it’s far too easy to end up passing your thunk to somebody who won’t catch the exception.)
As a matter of Go style, of course, you should almost always defer unlock() after you lock(), but some people sometimes get clever and think that they can just lock() and unlock() manually without using defer. This is not hypothetical, and it causes other problems besides leaving dangling locks after a panic(). Somebody sticks a “return” between lock() and unlock(), without noticing, for example.
So my impression of catching panic() is that it is about as safe as not catching panic(). What I mean by that is that if recover() is not safe in your code base, there is a good chance that there are other, related bugs in your code base, and being a bit more strict about using defer and not trying to be clever will go a long way.
A goroutine created inside an http request handler (itself a goroutine) which then panics, by default will crash the whole server, not the single request. The panic could simply be an out of bounds access. That should not crash the whole server.
It’s a logic bug, but you can’t “not panic”. You can trap and recover it though.
Bit orthogonal to OP but relevant to your reasoning.
This isn't catching the panic though, this is propagating the panic through the parent goroutine. The whole program will still shut down, but the stacktrace that shows up in the panic contains not only information information about the goroutine that panicked, but also the launching goroutine. That can help you figure out why the panic happened to begin with.
if you've ever had to deal with a zombie JVM where an uncaught exception eventually put the whole application into a state where it wasn't, oh, hoovering the tens of thousands of temporary files it was creating because the vacuumer thread died undetected, the Go behavior is _extremely_ attractive.
How the heck should a library author know whether an error is recoverable or not? In many situations that depends entirely on the caller.
Not-so-contrived example: a config options contains a value that leads to a division by zero in some library I'm using. I don't want that to crash my whole program, I want to tell the user "hey this part didn't work" but be able to continue with the other parts.
I started to write Go recently and I very much prefer panics to errors. For example in web apps. Panic produces nice stack trace. Panic bubbles automatically, I can reduce amount of code significantly. Errors are awful and panics allow to write code without losing sanity. It's like good old exceptions. And if I really need it, I can catch panic and act as needed, e.g. return HTTP code or whatever.
You should rarely reach for panic. Panics are not like exceptions, really. It's very frowned upon for panics to pass API boundaries. Use errors always, unless the state of the world is irrecoverably broken.
It's difficult to come up with examples of when this may be the case. Often it's a "you're holding it wrong" kind of thing. For example, a common idiom is wrapping something like
func Parse(s string) (Foo, error)
to be usable in contexts where error handling isn't possible, such as variable initialization.
func Must(f Foo, err error) Foo {
if err != nil {
panic(err)
}
return f
}
which could be used at the global level like so:
var someFoo = Must(Parse("hey"))
This is acceptable because this code is static and will either panic at boot always or never.
Hi! Author here. Conc is the result of generalizing and cleaning up an internal package I wrote for use within Sourcegraph. Basically, I got tired of rewriting code that handled panics, limited concurrency, and ensured goroutine cleanup. Happy to answer questions or address comments.
I literally started drafting my own structured concurrency proposal for Go 2 today, due to exactly the same frustrations you mention. Such a coincidence, and thanks for writing this lib. I will most certainly use it.
Please could you tell me if you have any thoughts on how to integrate these ideas into the language?
One thing I think should be solved (and that appears not addressed by your lib?) is the function coloring problem of context. I would really, really love if context was implicit and universal cancel/deadline mechansim, all the way down to IO. That way, only parts of the chain that NEED to use the context would be affected (which is a minority in practice). Any thoughts on that?
Finally, I think somebody should create a more visionary and coherent proposal for improved concurrency in Go2, because individual proposals on Go’s issue tracker are shut down quickly due to not making enough sense in isolation - and status quo bias. It’s a shame because I really love Go and want to see reduced footguns and boilerplate – especially with concurrency often being the largest source of complexity in a project. Please find my email in profile if you’re interested to pursue this further.
> I would really, really love if context was implicit and universal cancel/deadline mechansim, all the way down to IO.
I don't think this is an improvement. Implicit behavior is difficult to identify and reason about. The only criticism of context that seems valid to me, aside from arbitrary storage being a huge antipattern, is that it was added long after nearly the entire standard library was authored, and it's usage is still relatively sparse.
We can agree that concurrency is difficult to use correctly, but since the introduction of generics it's much easier to wrap channels and goroutines.
Aside, in my experience if you're worried about boilerplate you're almost always looking at the problem wrong and optimizing for convenience over simplicity.
Is the alternative to crash and restart the whole process on panic? It would make sense if someone wants to write an in-process supervisor (similar to Erlang?) but this would be basically a main-wrapper - not a per-http-request thing (because Golang http itself would be corrupted).
I don’t know enough to say where the crash isolation boundary should best lie, BUT, assuming that you can catch panics at all, it makes a lot of sense that they are propagated upwards in the call stack. The idea of structured concurrency is that concurrent code is attached to the callers stack, at least in spirit.
Great project. It seems like channels are just the wrong tool for a lot of concurrency problems. More powerful than needed and easy to get wrong. Lots of nice ways to make go concurrency safer.
The problem that bothers me (and isnt in Conc), is how hard it is to run different things in the background and gather the results in different ways. Particularly when you start doing those things conditionally and reusing results.
>The problem that bothers me (and isnt in Conc), is how hard it is to run different things in the background and gather the results in different ways. Particularly when you start doing those things conditionally and reusing results.
Do you have any examples ? About only that I can think of is "parse something to a bunch of different types" and that can be solved easily enough. What do you mean by "reusing results" ?
Just took a glance but it seems like this is exactly the kind of project I saw coming out of generics going live. I was really surprised to see how subtly hard go concurrency was to do right when initially learning it. Something like this that formalizes patterns and keeps you from leaking goroutines / deadlocking without fuss is great.
Its was initially something that surprised me about Go. It was famous for good concurrency and yet compared to many functional languages and contemporary OO languages it had a lot of foot guns. There is a lot of repetition of code that even in languages like Java had long been made common. Go seems to lack obvious concurrency abstractions.
When they announced generics the first thing I did with it was rewrite my common slice parallel algorithm and my limited concurrency pool. It is an obvious area needing improvement for common use cases.
Go has easy concurrency, but it’s not good. The opposite really. It’s at best the similar problematic model you get in other, older, procedural / oo langages. At worst it’s a lot worse given how difficult it is (and even more so was before generics) to build abstractions and concurrent data structures.
I think one of the examples they give is a bit misleading. This
func process(stream chan int) {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for elem := range stream {
handle(elem)
}
}()
}
wg.Wait()
}
And
func process(stream chan int) {
p := pool.New().WithMaxGoroutines(10)
for elem := range stream {
elem := elem
p.Go(func() {
handle(elem)
})
}
p.Wait()
}
Do slightly different things. The first one has 10 independent, long-lived, go-routines that are all consuming from a single channel. The second one has the current thread read from the channel and dynamically spawn go-routines. They have the same effect, but different performance characteristics.
I haven't looked at the implementation at all, but it is possible that the pool is keeping goroutines alive, and the `Go()` method writes to a single `chan func()` that those goroutines read off of.
Which still isn't exactly equivalent, there's still an additional channel read due to the `for elem := range stream {}` loop, and likely an allocation due to the closure.
This is exactly correct. Behavior is equivalent, performance is not. It's probably still not a great example because if reading from a channel already, you're probably better off spawning 10 tasks that read off that channel, but the idea of the example was that it can handle unbounded streams with bounded concurrency.
Ahh having a generic pool abstraction that collects results is tempting... nice work. I likely won't use this library though, since
- I don't want to mess with panics.
- The default concurrency GOMAXPROCS is almost never what I want.
- Aggregated errors are almost never what I want. (I haven't read the implementation, but I worry about losing important error codes, or propagating something huge across an RPC boundary).
- Using it would place a burden on any reader that isn't familiar with the library
> The default concurrency GOMAXPROCS is almost never what I want.
FWIW, the default concurrency has been changed to "unlimited" since the 0.1.0 release.
> Aggregated errors are almost never what I want.
Out of curiousity, what do you want? There is an option to only keep the first error, and it's possible to unwrap the error to an array of errors that compose it if you just want a slice of errors.
> Using it would place a burden on any reader that isn't familiar with the library
Using concurrency in general places a burden on the reader :) I personally find using patterns like this to significantly reduce the read/review burden.
> FWIW, the default concurrency has been changed to "unlimited" since the 0.1.0 release.
Nice! Will that end up on Github?
> Out of curiousity, what do you want
Most often I want to return just the first error. Some reasons: (1) smaller error messages passed across RPC boundaries (2) original errors can be inspected as intended (e.g. error codes) (3) when the semantics are to cancel after the first error, the errors that come after that are just noise.
A couple other thoughts
- I think the non-conc comparison code is especially hairy since it's using goroutine pools. There's nothing wrong with that and it's fast, just not the easiest to work with. Often goroutine overhead is negligible and I would bound concurrency in dumber ways e.g. by shoving a sync.Semaphore into whatever code I otherwise have
- I like that errgroup has .Go() block when concurrency limit is reached. Based on a quick skim of the code I think(?) conc does that too, but it could use documentation
If you click through to the actual api doc it makes a lot more sense: "
WithCollectErrored configures the pool to still collect the result of a task even if the task returned an error. By default, the result of tasks that errored are ignored and only the error is collected."
If your code panics, your process should probably just crash (unless you're abusing panics to pass state around to yourself, but that's another story). The overall program state could be invalid and continuing to run may be dangerous. Your program needs to be able to recover from an unexpected termination anyway.
Not recovering panics goes extra for generic packages that are executing user code. You have no idea how badly broken things are.
> A frequent problem with goroutines in long-running applications is handling panics. A goroutine spawned without a panic handler will crash the whole process on panic. This is usually undesirable.
In go land, this seems desirable. Recoverable errors should be propagated as return values, not as panics.
I heavily used errgroup before creating conc, so the design is likely strongly influenced by that of errgroup even if not consciously. Conc was partially built to address the shortcomings of errgroup (from my perspective). Probably worth adding a "prior art" section to the README, but many of the ideas in conc are not unique.
> In go land, this seems desirable.
I mostly agree, which is why `Wait()` propagates the panic rather than returning it or logging it. This keeps panics scoped to the spawning goroutine and enables getting stacktraces from both the spawning goroutine and the spawned goroutine, which is quite useful for debugging.
That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.
The problem is that panics aren't "goroutine scoped" in terms of their potential impact. So it really shouldn't be up to the user to decide how to handle a panic. Application code shouldn't handle panics at all! They're not just a different way to yield an error, they're critical bugs which shouldn't occur at all.
This is exactly what's undesirable. (IMO and from my reading the GP agrees.)
Other paradigms, like the Erlang paradigm, can have better behaviors even if a top-level evaluation fails. But in an imperative language, there really isn't anything else you should do. It is arguably one of the Clues (in the "cluestick" sense) that the imperative paradigm is perhaps not the one that should be the base of our technology. But that's a well-debated matter.
(Haskell’s lazy evaluation just makes it a bit harder to catch, since you need to force evaluation of the thunk within the catch statement, and it’s far too easy to end up passing your thunk to somebody who won’t catch the exception.)
As a matter of Go style, of course, you should almost always defer unlock() after you lock(), but some people sometimes get clever and think that they can just lock() and unlock() manually without using defer. This is not hypothetical, and it causes other problems besides leaving dangling locks after a panic(). Somebody sticks a “return” between lock() and unlock(), without noticing, for example.
So my impression of catching panic() is that it is about as safe as not catching panic(). What I mean by that is that if recover() is not safe in your code base, there is a good chance that there are other, related bugs in your code base, and being a bit more strict about using defer and not trying to be clever will go a long way.
It’s a logic bug, but you can’t “not panic”. You can trap and recover it though.
Bit orthogonal to OP but relevant to your reasoning.
If you have any care for quality at all, there's nothing "simple" about your invariants being violated.
Dead Comment
Not-so-contrived example: a config options contains a value that leads to a division by zero in some library I'm using. I don't want that to crash my whole program, I want to tell the user "hey this part didn't work" but be able to continue with the other parts.
Without writing a ton of boilerplate code.
It's difficult to come up with examples of when this may be the case. Often it's a "you're holding it wrong" kind of thing. For example, a common idiom is wrapping something like
to be usable in contexts where error handling isn't possible, such as variable initialization. which could be used at the global level like so: This is acceptable because this code is static and will either panic at boot always or never.Please could you tell me if you have any thoughts on how to integrate these ideas into the language?
One thing I think should be solved (and that appears not addressed by your lib?) is the function coloring problem of context. I would really, really love if context was implicit and universal cancel/deadline mechansim, all the way down to IO. That way, only parts of the chain that NEED to use the context would be affected (which is a minority in practice). Any thoughts on that?
Finally, I think somebody should create a more visionary and coherent proposal for improved concurrency in Go2, because individual proposals on Go’s issue tracker are shut down quickly due to not making enough sense in isolation - and status quo bias. It’s a shame because I really love Go and want to see reduced footguns and boilerplate – especially with concurrency often being the largest source of complexity in a project. Please find my email in profile if you’re interested to pursue this further.
Thanks again.
I don't think this is an improvement. Implicit behavior is difficult to identify and reason about. The only criticism of context that seems valid to me, aside from arbitrary storage being a huge antipattern, is that it was added long after nearly the entire standard library was authored, and it's usage is still relatively sparse.
We can agree that concurrency is difficult to use correctly, but since the introduction of generics it's much easier to wrap channels and goroutines.
Aside, in my experience if you're worried about boilerplate you're almost always looking at the problem wrong and optimizing for convenience over simplicity.
I don’t know enough to say where the crash isolation boundary should best lie, BUT, assuming that you can catch panics at all, it makes a lot of sense that they are propagated upwards in the call stack. The idea of structured concurrency is that concurrent code is attached to the callers stack, at least in spirit.
The problem that bothers me (and isnt in Conc), is how hard it is to run different things in the background and gather the results in different ways. Particularly when you start doing those things conditionally and reusing results.
Something like go-future helps. https://github.com/stephennancekivell/go-future
Do you have any examples ? About only that I can think of is "parse something to a bunch of different types" and that can be solved easily enough. What do you mean by "reusing results" ?
> Something like go-future helps. https://github.com/stephennancekivell/go-future
that looks pretty awkward. with channels it would just beWritten more concisely.
f := New(func() Type{return t}) v := g.Get() w := g.Get()
I'd be curious to see an example of the type of task you want to be able to do more safely
When they announced generics the first thing I did with it was rewrite my common slice parallel algorithm and my limited concurrency pool. It is an obvious area needing improvement for common use cases.
That’s because people confuse “easy” and “good”.
Go has easy concurrency, but it’s not good. The opposite really. It’s at best the similar problematic model you get in other, older, procedural / oo langages. At worst it’s a lot worse given how difficult it is (and even more so was before generics) to build abstractions and concurrent data structures.
Which still isn't exactly equivalent, there's still an additional channel read due to the `for elem := range stream {}` loop, and likely an allocation due to the closure.
Thanks!
> The default concurrency GOMAXPROCS is almost never what I want.
FWIW, the default concurrency has been changed to "unlimited" since the 0.1.0 release.
> Aggregated errors are almost never what I want.
Out of curiousity, what do you want? There is an option to only keep the first error, and it's possible to unwrap the error to an array of errors that compose it if you just want a slice of errors.
> Using it would place a burden on any reader that isn't familiar with the library
Using concurrency in general places a burden on the reader :) I personally find using patterns like this to significantly reduce the read/review burden.
Nice! Will that end up on Github?
> Out of curiousity, what do you want
Most often I want to return just the first error. Some reasons: (1) smaller error messages passed across RPC boundaries (2) original errors can be inspected as intended (e.g. error codes) (3) when the semantics are to cancel after the first error, the errors that come after that are just noise.
A couple other thoughts
If you click through to the actual api doc it makes a lot more sense: " WithCollectErrored configures the pool to still collect the result of a task even if the task returned an error. By default, the result of tasks that errored are ignored and only the error is collected."
Not recovering panics goes extra for generic packages that are executing user code. You have no idea how badly broken things are.