Conc: Better Structured Concurrency for Go

The WaitGroup looks suspiciously like errgroup, which even has the .WithMaxGoroutines() functionality: https://pkg.go.dev/golang.org/x/sync/errgroup

> A frequent problem with goroutines in long-running applications is handling panics. A goroutine spawned without a panic handler will crash the whole process on panic. This is usually undesirable.

In go land, this seems desirable. Recoverable errors should be propagated as return values, not as panics.

camdencheek · 3 years ago

> The WaitGroup looks suspiciously like errgroup

I heavily used errgroup before creating conc, so the design is likely strongly influenced by that of errgroup even if not consciously. Conc was partially built to address the shortcomings of errgroup (from my perspective). Probably worth adding a "prior art" section to the README, but many of the ideas in conc are not unique.

> In go land, this seems desirable.

I mostly agree, which is why `Wait()` propagates the panic rather than returning it or logging it. This keeps panics scoped to the spawning goroutine and enables getting stacktraces from both the spawning goroutine and the spawned goroutine, which is quite useful for debugging.

That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.

bertmuthalaly · 3 years ago

A "prior art" section is especially useful for people evaluating your library!

preseinger · 3 years ago

> That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.

The problem is that panics aren't "goroutine scoped" in terms of their potential impact. So it really shouldn't be up to the user to decide how to handle a panic. Application code shouldn't handle panics at all! They're not just a different way to yield an error, they're critical bugs which shouldn't occur at all.

morelisp · 3 years ago

> This keeps panics scoped to the spawning goroutine

This is exactly what's undesirable. (IMO and from my reading the GP agrees.)

jerf · 3 years ago

It is the way of things in an imperative language. If you catch a panic, you are also declaring to the runtime that there is nothing dangling, no locks in a bad state, etc. This is often the case. (Although since I don't think this is a well-understood aspect of what catching a panic means, it is arguably only usually true by a certain amount of coincidence.) But if you don't say that to the runtime, it can't assume it safely and terminating the program, while violent, is arguably either the best option or the only correct option.

Other paradigms, like the Erlang paradigm, can have better behaviors even if a top-level evaluation fails. But in an imperative language, there really isn't anything else you should do. It is arguably one of the Clues (in the "cluestick" sense) that the imperative paradigm is perhaps not the one that should be the base of our technology. But that's a well-debated matter.

klodolph · 3 years ago

I don’t think this is really a question of whether your code is imperative, since Haskell code will terminate just as surely as Go code if you try to access an array element out of range.

(Haskell’s lazy evaluation just makes it a bit harder to catch, since you need to force evaluation of the thunk within the catch statement, and it’s far too easy to end up passing your thunk to somebody who won’t catch the exception.)

As a matter of Go style, of course, you should almost always defer unlock() after you lock(), but some people sometimes get clever and think that they can just lock() and unlock() manually without using defer. This is not hypothetical, and it causes other problems besides leaving dangling locks after a panic(). Somebody sticks a “return” between lock() and unlock(), without noticing, for example.

So my impression of catching panic() is that it is about as safe as not catching panic(). What I mean by that is that if recover() is not safe in your code base, there is a good chance that there are other, related bugs in your code base, and being a bit more strict about using defer and not trying to be clever will go a long way.

alexeldeib · 3 years ago

A goroutine created inside an http request handler (itself a goroutine) which then panics, by default will crash the whole server, not the single request. The panic could simply be an out of bounds access. That should not crash the whole server.

It’s a logic bug, but you can’t “not panic”. You can trap and recover it though.

Bit orthogonal to OP but relevant to your reasoning.

morelisp · 3 years ago

> simply be an out of bounds access

If you have any care for quality at all, there's nothing "simple" about your invariants being violated.

Dead Comment

avita1 · 3 years ago

This isn't catching the panic though, this is propagating the panic through the parent goroutine. The whole program will still shut down, but the stacktrace that shows up in the panic contains not only information information about the goroutine that panicked, but also the launching goroutine. That can help you figure out why the panic happened to begin with.

foobiekr · 3 years ago

if you've ever had to deal with a zombie JVM where an uncaught exception eventually put the whole application into a state where it wasn't, oh, hoovering the tens of thousands of temporary files it was creating because the vacuumer thread died undetected, the Go behavior is _extremely_ attractive.

Smurfix · 3 years ago

How the heck should a library author know whether an error is recoverable or not? In many situations that depends entirely on the caller.

Not-so-contrived example: a config options contains a value that leads to a division by zero in some library I'm using. I don't want that to crash my whole program, I want to tell the user "hey this part didn't work" but be able to continue with the other parts.

Without writing a ton of boilerplate code.

vbezhenar · 3 years ago

I started to write Go recently and I very much prefer panics to errors. For example in web apps. Panic produces nice stack trace. Panic bubbles automatically, I can reduce amount of code significantly. Errors are awful and panics allow to write code without losing sanity. It's like good old exceptions. And if I really need it, I can catch panic and act as needed, e.g. return HTTP code or whatever.

xyzzy_plugh · 3 years ago

You should rarely reach for panic. Panics are not like exceptions, really. It's very frowned upon for panics to pass API boundaries. Use errors always, unless the state of the world is irrecoverably broken.

It's difficult to come up with examples of when this may be the case. Often it's a "you're holding it wrong" kind of thing. For example, a common idiom is wrapping something like

    func Parse(s string) (Foo, error)

to be usable in contexts where error handling isn't possible, such as variable initialization.

    func Must(f Foo, err error) Foo {
        if err != nil {
            panic(err)
        }
        return f
    }

which could be used at the global level like so:

    var someFoo = Must(Parse("hey"))

This is acceptable because this code is static and will either panic at boot always or never.

ilyt · 3 years ago

Also if that is your concern you'd just wrap the panic on your own anyway.

Hi! Author here. Conc is the result of generalizing and cleaning up an internal package I wrote for use within Sourcegraph. Basically, I got tired of rewriting code that handled panics, limited concurrency, and ensured goroutine cleanup. Happy to answer questions or address comments.

klabb3 · 3 years ago

I literally started drafting my own structured concurrency proposal for Go 2 today, due to exactly the same frustrations you mention. Such a coincidence, and thanks for writing this lib. I will most certainly use it.

Please could you tell me if you have any thoughts on how to integrate these ideas into the language?

One thing I think should be solved (and that appears not addressed by your lib?) is the function coloring problem of context. I would really, really love if context was implicit and universal cancel/deadline mechansim, all the way down to IO. That way, only parts of the chain that NEED to use the context would be affected (which is a minority in practice). Any thoughts on that?

Finally, I think somebody should create a more visionary and coherent proposal for improved concurrency in Go2, because individual proposals on Go’s issue tracker are shut down quickly due to not making enough sense in isolation - and status quo bias. It’s a shame because I really love Go and want to see reduced footguns and boilerplate – especially with concurrency often being the largest source of complexity in a project. Please find my email in profile if you’re interested to pursue this further.

Thanks again.

xyzzy_plugh · 3 years ago

> I would really, really love if context was implicit and universal cancel/deadline mechansim, all the way down to IO.

I don't think this is an improvement. Implicit behavior is difficult to identify and reason about. The only criticism of context that seems valid to me, aside from arbitrary storage being a huge antipattern, is that it was added long after nearly the entire standard library was authored, and it's usage is still relatively sparse.

We can agree that concurrency is difficult to use correctly, but since the introduction of generics it's much easier to wrap channels and goroutines.

Aside, in my experience if you're worried about boilerplate you're almost always looking at the problem wrong and optimizing for convenience over simplicity.

preseinger · 3 years ago

It's a good package in general, save for the panic handling. Panics should not be handled in this way. Remove that wart, and it's solid.

klabb3 · 3 years ago

Is the alternative to crash and restart the whole process on panic? It would make sense if someone wants to write an in-process supervisor (similar to Erlang?) but this would be basically a main-wrapper - not a per-http-request thing (because Golang http itself would be corrupted).

I don’t know enough to say where the crash isolation boundary should best lie, BUT, assuming that you can catch panics at all, it makes a lot of sense that they are propagated upwards in the call stack. The idea of structured concurrency is that concurrent code is attached to the callers stack, at least in spirit.

func process(stream chan int) { var wg sync.WaitGroup for i := 0; i < 10; i++ { wg.Add(1) go func() { defer wg.Done() for elem := range stream { handle(elem) } }() } wg.Wait() }

func process(stream chan int) { p := pool.New().WithMaxGoroutines(10) for elem := range stream { elem := elem p.Go(func() { handle(elem) }) } p.Wait() }

- I don't want to mess with panics. - The default concurrency GOMAXPROCS is almost never what I want. - Aggregated errors are almost never what I want. (I haven't read the implementation, but I worry about losing important error codes, or propagating something huge across an RPC boundary). - Using it would place a burden on any reader that isn't familiar with the library

- I think the non-conc comparison code is especially hairy since it's using goroutine pools. There's nothing wrong with that and it's fast, just not the easiest to work with. Often goroutine overhead is negligible and I would bound concurrency in dumber ways e.g. by shoving a sync.Semaphore into whatever code I otherwise have - I like that errgroup has .Go() block when concurrency limit is reached. Based on a quick skim of the code I think(?) conc does that too, but it could use documentation

kevmo314 · 3 years ago

stephen123 · 3 years ago

Great project. It seems like channels are just the wrong tool for a lot of concurrency problems. More powerful than needed and easy to get wrong. Lots of nice ways to make go concurrency safer.

The problem that bothers me (and isnt in Conc), is how hard it is to run different things in the background and gather the results in different ways. Particularly when you start doing those things conditionally and reusing results.

Something like go-future helps. https://github.com/stephennancekivell/go-future

>The problem that bothers me (and isnt in Conc), is how hard it is to run different things in the background and gather the results in different ways. Particularly when you start doing those things conditionally and reusing results.

Do you have any examples ? About only that I can think of is "parse something to a bunch of different types" and that can be solved easily enough. What do you mean by "reusing results" ?

> Something like go-future helps. https://github.com/stephennancekivell/go-future

    f := future.New(func() string {
        return "value"
    })

    value := f.Get()

that looks pretty awkward. with channels it would just be

    f := Async(func() Type{return t})
    v := <- f

The main difference is that reading from channels will block if its empty where Futures, return the same value.

Written more concisely.

f := New(func() Type{return t}) v := g.Get() w := g.Get()

> run different things in the background and gather the results in different ways

I'd be curious to see an example of the type of task you want to be able to do more safely

JamesSwift · 3 years ago

Just took a glance but it seems like this is exactly the kind of project I saw coming out of generics going live. I was really surprised to see how subtly hard go concurrency was to do right when initially learning it. Something like this that formalizes patterns and keeps you from leaking goroutines / deadlocking without fuss is great.

PaulKeeble · 3 years ago

Its was initially something that surprised me about Go. It was famous for good concurrency and yet compared to many functional languages and contemporary OO languages it had a lot of foot guns. There is a lot of repetition of code that even in languages like Java had long been made common. Go seems to lack obvious concurrency abstractions.

When they announced generics the first thing I did with it was rewrite my common slice parallel algorithm and my limited concurrency pool. It is an obvious area needing improvement for common use cases.

masklinn · 3 years ago

> It was famous for good concurrency

That’s because people confuse “easy” and “good”.

Go has easy concurrency, but it’s not good. The opposite really. It’s at best the similar problematic model you get in other, older, procedural / oo langages. At worst it’s a lot worse given how difficult it is (and even more so was before generics) to build abstractions and concurrent data structures.

pkolaczk · 3 years ago

It is surprising to me that so many people think that message passing is simpler to get right than shared memory concurrency, when in fact it is not: https://cseweb.ucsd.edu/~yiying/GoStudy-ASPLOS19.pdf

sudhirj · 3 years ago

The Java Stream API is pretty awesome, it's my biggest wishlist item for Go.

openasocket · 3 years ago

I think one of the examples they give is a bit misleading. This

And

Do slightly different things. The first one has 10 independent, long-lived, go-routines that are all consuming from a single channel. The second one has the current thread read from the channel and dynamically spawn go-routines. They have the same effect, but different performance characteristics.

chrsig · 3 years ago

I haven't looked at the implementation at all, but it is possible that the pool is keeping goroutines alive, and the `Go()` method writes to a single `chan func()` that those goroutines read off of.

Which still isn't exactly equivalent, there's still an additional channel read due to the `for elem := range stream {}` loop, and likely an allocation due to the closure.

This is exactly correct. Behavior is equivalent, performance is not. It's probably still not a great example because if reading from a channel already, you're probably better off spawning 10 tasks that read off that channel, but the idea of the example was that it can handle unbounded streams with bounded concurrency.

tmerr · 3 years ago

Ahh having a generic pool abstraction that collects results is tempting... nice work. I likely won't use this library though, since

> nice work

Thanks!

> The default concurrency GOMAXPROCS is almost never what I want.

FWIW, the default concurrency has been changed to "unlimited" since the 0.1.0 release.

> Aggregated errors are almost never what I want.

Out of curiousity, what do you want? There is an option to only keep the first error, and it's possible to unwrap the error to an array of errors that compose it if you just want a slice of errors.

> Using it would place a burden on any reader that isn't familiar with the library

Using concurrency in general places a burden on the reader :) I personally find using patterns like this to significantly reduce the read/review burden.

> FWIW, the default concurrency has been changed to "unlimited" since the 0.1.0 release.

Nice! Will that end up on Github?

> Out of curiousity, what do you want

Most often I want to return just the first error. Some reasons: (1) smaller error messages passed across RPC boundaries (2) original errors can be inspected as intended (e.g. error codes) (3) when the semantics are to cancel after the first error, the errors that come after that are just noise.

A couple other thoughts

zdragnar · 3 years ago

Is it just me or are the names and descriptions really confusing? i.e.

    p.WithCollectErrored() configures result pools to only collect results that did not error

TylerE · 3 years ago

Think it's an error on the homepage.

If you click through to the actual api doc it makes a lot more sense: " WithCollectErrored configures the pool to still collect the result of a task even if the task returned an error. By default, the result of tasks that errored are ignored and only the error is collected."

Whoops, yep, thanks for pointing it out. Just fixed it

aequitas720 · 3 years ago

If your code panics, your process should probably just crash (unless you're abusing panics to pass state around to yourself, but that's another story). The overall program state could be invalid and continuing to run may be dangerous. Your program needs to be able to recover from an unexpected termination anyway.

Not recovering panics goes extra for generic packages that are executing user code. You have no idea how badly broken things are.