Progress on No-GIL CPython

Interesting discussion there too.

With modern computers I wonder if explicit parallelism is more fundamental to what our computer science will be than is in vogue in textbooks. Perhaps we should always be writing explicitly parallel code at this point.

Gh0stRAT · 2 years ago

Humans are bad at reasoning about multiple threads simultaneously, so I suspect the more practical shift is the trend we've already been seeing toward more declarative syntax.

eg `for` loops are being replaced by `foreach` loops,`map` and `filter` operations, etc. These tell the compiler/interpreter that you want to do some operation to all the items in your datastructure, leaving it up to the compiler/runtime whether and how to parallelize the work for you.

MR4D · 2 years ago

I would upvote this 100 times if I could.

I've thought this way ever since MacOS added the Grand Central Dispatch [1]. Of course, I thought industry would follow quickly and that tooling would coalesce around this concept pretty quickly. Seems the industry wants to take its sweet time.

[1] - https://en.wikipedia.org/wiki/Grand_Central_Dispatch

tester756 · 2 years ago

>eg `for` loops are being replaced by `foreach` loops,`map` and `filter` operations, etc. These tell the compiler/interpreter that you want to do some operation to all the items in your datastructure, leaving it up to the compiler/runtime whether and how to parallelize the work for you.

There's difference between doing it in order 1, 2, 3 and 3, 1, 2.

foreach will not be replaced behind the scenes into multithreaded version since it changes behaviour.

for is replaced with foreach because usually you dont need index and foreach is just handier and safer, that's it.

.NET's std lib has Parallel.ForEach for such a thing.

We really don't need magic to write multithreaded code. All we need is just really, really well designed APIs and primitives.

Affric · 2 years ago

I agree to a large extent but I am referring more to our teaching of Computer Science. For our teaching of Software Engineering I think you're largely correct.

> Humans are bad at reasoning about multiple threads simultaneously

I am not so sure this is true, I do believe that people are poorly practiced. My experiences have led me to believe Universities silo explicit parallel programming too much. It's generally it's own non-compulsory subject in a Comp-Sci major.

lionkor · 2 years ago

C++ has had std::execution_policy for a long time now - you pass that with an algorithm like sort, for_each, etc. and it will choose a way to parallelize that for you.

psalminen · 2 years ago

I like the way you word this. Similar to the product I make, I describe my mind as an asynchronous queue. I can only reason about one thing at a time, but when I do that is fairly random.

How this has played out in my life gives me caution about making this standard in computing.

Tarq0n · 2 years ago

Considering the developments in data engineering land I wonder if we'll be describing our operations as a DAG rather than maps and folds specifically.

pjmlp · 2 years ago

It is more like the mainstream world is finally catching up with the world promised by functional programming since Lisp and parallel computing exists.

Only now I can enjoy in modern hardware what I had to imagine when reading papers about Star Lisp and the Connection Machine, alongside other similar approaches.

johnloeber · 2 years ago

Yep! The only thing that remains is to focus on that code being properly functional; i.e. avoiding side-effects. Side-effects and parallelism don't mix well. Wonder if this will give rise to more functional languages.

likeabbas · 2 years ago

There will still be cases where more fine tuned control is warranted. Rust has done this very intelligently by moving data race controls to the compiler level.

imtringued · 2 years ago

How about some HDL semantics with implicit pipelining...

Every statement in a HDL language runs in parallel but you can still write implicitly sequential code in VHDL processes.

graemep · 2 years ago

Is the difficulty reasoning about threads a bit more specific than that? I think it is reasoning about threads with shared mutable state.

FpUser · 2 years ago

>"Humans are bad at reasoning about multiple threads simultaneously"

Humans are bad at reasoning about way too many things. I think mostly because many are lazy and do not want to learn. The ones who do have little problems. I do not find thread management particularly hard for the most parts (there are some exceptions but those are very uncommon).

minikomi · 2 years ago

Shades of clojure's transducers

dehrmann · 2 years ago

Parallelism ended up going off in a few different directions.

For things like running a web service, requests are fast enough, and the real win from parallelism is in handling lots of requests side-by-side. This is where No-GIL comes in.

Within handling a single request, if there are a lot of sub-requests, that's usually handled by async code, but not so much for the async performance win as much as spinning up threads is either expensive or thread pools are a hassle. Remember that async is better for throughput, but worse for latency, and if you're parallelizing a service request, you're probably more worried about latency. Async won mostly on ergonomics.

The other place you see parallelism is large offline jobs. Things like Map-reduce and Presto. Those tend to look like divide-and-conquer problems. GPU model training looks something like this.

What never happened is local, highly parallel algorithms. For a web service, data size is too small to see a latency win, they're complicated, and coordination between threads become costly. The small exceptions are vectorized algorithms, but these run one one core, so there isn't coordination overhead, and online inference, but again, this is heavily vectorized.

rsaxvc · 2 years ago

> What never happened is local, highly parallel algorithms.

GPUs maybe? Also, excellent answer.

eslaught · 2 years ago

Parallelism in CS is a bit like security in CS. People know it matters in the abstract senses but you really only get into it if you look for the training specifically. We're getting better at both over time: just as more languages/libraries/etc. are secure by default, more now are parallel by default. There's a ways to go, but I'm glad we didn't do this prematurely, because the technology has improved a lot in the last decade. Look for example at what we can do (safely!) with Rayon in Rust vs (unsafely!) with OpenMP in C++.

And there are things even further afield like what I work on [1][2][3].

[1]: https://legion.stanford.edu/

[2]: https://regent-lang.org/

[3]: https://github.com/nv-legate/cunumeric

eyegor · 2 years ago

See also

https://github.com/cupy/cupy

https://github.com/inducer/pyopencl

winter_blue · 2 years ago

What's the difference between Legion and Regent, by the way?

I noticed the Regent code is inside the Legion repo. Is Legion the system, and Regent the language?

Can Legion be used without Regent, or vice versa?

menaerus · 2 years ago

> Look for example at what we can do (safely!) with Rayon in Rust

"Safely" for a certain kind of definition of safety: https://github.com/search?q=repo%3Arayon-rs%2Frayon+unsafe&t...

xboxnolifes · 2 years ago

As I see it, parallelism is in the same vein as memory management. Most of what we program can, and should, use some form of automatic management, and manual management is reserved for the areas where it is needed for performance.

It's an implementation detail, and if we can abstract it away to make it easier to utilize, we should.

samsquire · 2 years ago

LMAX Disruptor has on their wiki that average latency to send a message from one thread to another at 53 nanoseconds. For comparison a mutex is like 25 nanoseconds and more if Contended but a mutex is point to point synchronization.

The great thing about the disruptor it is that multiple threads can receive the same message without much more effort.

https://github.com/LMAX-Exchange/disruptor/wiki/Performance-...

https://gist.github.com/rmacy/2879257

I am dreaming of language that is similar to Smalltalk that stays single threaded until it makes sense to parallise.

I am looking for problems for parallelism that are not big data. Parallelism is like adding more cars to the road rather than increasing the speed of the car. But what does a desktop or mobile user need to do locally that could take advantage of the mathematical power of a computer? I'm still searching.

I am thoughtful of the Itanium and VLIW architecture for parallelism ideas.

Affric · 2 years ago

> I am looking for problems for parallelism that are not big data. Parallelism is like adding more cars to the road rather than increasing the speed of the car. But what does a desktop or mobile user need to do locally that could take advantage of the mathematical power of a computer? I'm still searching.

The things we currently let servers do but it would mean we can keep user data local and not hand it over to service providers. I believe that is a worthy end goal.

sitkack · 2 years ago

Pervasive parallelism could make massive efficiency gains in computation possible. If we could move many work loads to hundreds or thousands of threads we could run it much lower clock frequencies and thus lower power. It could also enable the use of cheap, small in order cores, further boosting core counts.

Multithreading doesn’t always have to be around increasing speed, it can also reduce power

tmountain · 2 years ago

It sounds like you are thinking about concurrency more than parallelism. The answer to your question is very general at a high level. Any task that can be broken up into chunks benefits. In the simplest terms, tasks that can be computed in buckets with a final result computed from those buckets will benefit from concurrency. Think of a video game as a good example. Environment calculations are happening in the background while the main game loop is processing. There are almost infinite use cases and examples, so I won’t try to enumerate them all.

imtringued · 2 years ago

I am more of a fan of finding implicit parallelism. You could think of a given problem as a heap of spaghetti. If you can untangle the mess, then each noodle might still be sequential, but you can process them in parallel.

If you can find enough independent sequential problems in your programs then you can easily fill up cores, mostly because we don't have that many. I only have eight.

The problem is that this requires additional graph processing and there it makes your programs slower, which kind of defeats the point. The goal is to find the right tradeoff.

Jtsummers · 2 years ago

Do you mean implicit parallelism? Because what we have now is typically explicit paralellism. Creating a thread or forking a subprocess is explicit parallelism, the programmer chooses that option. A function like `map` that can utilize a parallel or sequential implementation depending on circumstances the runtime or compiler are aware of without direct input from the programmer would be implicit parallelism. If the `map` function takes an execution context that signifies parallel or sequential execution then we're back to explicit.

jacquesm · 2 years ago

Reasoning about parallel execution is hard. You need high level language and library support for that unless you want to spend the rest of your life in tricky debugging territory.

jeremycarter · 2 years ago

I think this is why I'm a huge advocate of the Actor Model.

jacquesm · 2 years ago

Likewise, it is one of the best solutions to this problem. And it also nicely maps onto language constructs, much nicer than the other options that I've worked with.

fulafel · 2 years ago

It seems serious parallel programming has mostly gone with shader/ISPC style data-parallel computation with low-level languages and the old school threads & locks model has been relegated to a side support role lost except on the CPU side.

There's interesting stuff going on in the VHLL world with languages like Futhark, Jax, Mojo, etc that would be a better peer group for Python and its high level of abstraction.

ketralnis · 2 years ago

Can you talk about the difference between what you're calling explicit parallelism and said textbooks?

spullara · 2 years ago

As soon as computers had 4+ cores we should have been going hard on it.

dmead · 2 years ago

Aren't we already?

User23 · 2 years ago

The formalisms for nondeterminism are over half a century old now. This is fundamentally a solved problem, although the typical case analysis technique many programmers tend to prefer falls down hard. Incidentally that’s why unix signals suck.

I always openly wonder with this proposal — how are they going to do this while making sure programs are still correct? So much existing multithreaded Python code is written in an unsafe manner.

Specifically, talking about data races I’ve seen time and again in codebases across companies and OSS projects. The programs don’t break only because they implicitly rely on the GIL providing execution to a single thread at a time. If the GIL is gone, then these programs will break. And since Python is such a dynamically typed language, I seriously doubt that there exists a static analyzer that could identify these issues in existing Python programs. More likely, they’ll be insidious bugs that crop up at runtime in a non-deterministic fashion. Ideally leading to a crash, With this class of bugs, it’s likely to just result in incorrect operations being performed.

Perhaps this GIL-less proposal isn’t actually intended to be used on the overwhelming majority of programs? Maybe it’s just a hyper specialized tool for a very few number of circumstances where the programmer knows, there won’t be a GIL, and can program against that fact?

pdonis · 2 years ago

If you have a multi-threaded program with data races, you already have a problem. The GIL does not mean no data races are possible. It just means that only one thread at a time can run Python bytecode. But the interpreter with the GIL can switch threads between bytecodes, and many Python operations, including built-in methods on built-in types that many people think of as "atomic", require multiple bytecodes. That's why Python already provides you with things like locks, mutexes, and semaphores, even though it currently has the GIL.

rtpg · 2 years ago

To put a finer point on this, I’ve had the misunderstanding in the past that the GIL made Python like JavaScript in some sense (only releasing the GIL on some explicit parts of code like sleep). But really Python threads can switch in the “middle” of your code. The reason the GIL is annoying is mostly performance related for Python code itself.

My understanding is the GIL does not protect against Python-side bugs, and bugs from GIL removal would only be introduced from C extensions.

shrimpx · 2 years ago

I think by “data races” the parent means that code doesn’t lock around operations like += and len(). The data races are there in theory but do not exhibit due to the GIL.

fulafel · 2 years ago

CPython is not really in a good position to change behaviour in incompatible ways and defend it by language lawyering, users expect the same code to mostly work in a backwards compatible way.

KeplerBoy · 2 years ago

Just a little fun fact: The GIL absolutely does not prevent all race condition bugs. Threads contending for the GIL can already steal it from each other at unfavorable times and cause havoc.

ynik · 2 years ago

In fact, it prevents very few race condition bugs.

Even inside a C extension where the Python API feels like it gives you control over when you release the GIL (with functions you'd have to call explicitly to release the GIL), it turns out that:

* any operation that allocates new Python objects might trigger garbage collection

* garbage collection may run `__del__` of objects completely unrelated to the currently running C code

* `__del__` can be implemented in python, thus releasing the GIL between bytecode instructions

Thus there's a lot of (rarely exercised) potential for concurrency even in C extensions that don't explicitly release the GIL themselves. nogil will make it easier to trigger data race bugs, but many of them will already have been theoretically possible before.

plonk · 2 years ago

I think the point was to let libraries declare whether they support nogil mode (opt-in), and your program would only run with no GIL if all the dependencies allow it? So they have all the time in the world to iron out those bugs.

eptcyka · 2 years ago

At what point can an interpreter establish that a given python script will not be importing any more modules?

tgv · 2 years ago

Yes, but the plan is to remove the opt-in in time. That will put a lot of pressure on the eco system. I expect many libraries written in C or relying on C-based extensions to simply get dropped. Which will make that users will stay on the last GIL-supporting version. It's Python 2->3, but potentially worse.

baggiponte · 2 years ago

I suppose no-GIL Python will be here in no less than 3/4 release cycles. 3.11 has been out for a year and most Python code in prod is what, 3.8? So I guess we won’t have to deal with this at scale before idk 2030 is approaching. I also don’t see Python runtimes in prod being updated from whatever they’re on now to newest releases. I don’t want to sound harsh, but the SC stated they don’t want to have another 2-to-3 migration, so people won’t update lightly. Yes, most of the content online right now might be dangerous to copy paste

sgarland · 2 years ago

Tangentially, 3.11 was the first release in quite some time to have major speed improvements across the board. The average is 25%, sometimes far more.

Anyone who hasn’t upgraded to it by now is needlessly spending extra on compute.

dragonwriter · 2 years ago

> most Python code in prod is what, 3.8?

3.8 is the oldest supported version, so I would hope not, but probably.

bratao · 2 years ago

The GIL only protect the Interpreter. The only thing it may do is to make it infrequently. There are MANY threading bugs in actual Python code.

perryizgr8 · 2 years ago

> So much existing multithreaded Python code is written in an unsafe manner.

Even multi-process Python code is often broken. The "recommended" way to serve a Django app is to run multiple workers (processes) using gunicorn. If you point the default logs to a file, even with log rotation enabled, all workers will keep stepping over each other because nobody knows which file to use. Keep in mind that this is broken by default, and this is the recommended way to use all this.

amelius · 2 years ago

On the other hand, perhaps translating existing modules to a No-GIL API is tedious but straightforward, and something that can be done using automated tools (perhaps even LLMs).

jacquesm · 2 years ago

The easiest way would be to have the GIL behind a feature flag that defaults to 'on'. That way you avoid yet another language split and if you don't want any possibly breaking changes you simply don't do anything at all. But if you want to run with the performance gains that a GIL free CPython would give you then you will have to do some extra testing to make sure that your stuff really is bullet proof with that flag set to the 'No-GIL' position.

mkoubaa · 2 years ago

Even with nogil, libraries can explicitly hold a global lock before any call. They just don't have to. I imagine some libraries will do that, and others will target performance. Users will vote with their tomls

usrbinbash · 2 years ago

> how are they going to do this while making sure programs are still correct? So much existing multithreaded Python code is written in an unsafe manner.

a) It isn't a language maintainers job to make sure unsafe code written by someone else in that language runs correctly.

b) The GIL doesn't prevent data races. It keeps the internal running of the interpreter sane, that's it's only job. There is a reason the threading library has a plethora of lock-classes

lisper · 2 years ago

There is a very simple practical solution: let the GIL (or something like it that forces single-threaded execution) be an option that you can turn on so that you can run broken legacy code.

Jtsummers · 2 years ago

This is, effectively, the plan for the next few Python releases. The plan is for a no-GIL and a GIL execution mode with GIL being the default. At some point, IIRC, the plan is to swap the default and then to eliminate the GIL option.

colordrops · 2 years ago

Wouldn't the most practical solution be for a script or module to turn it off instead? Then you don't break any legacy code. Anyone writing code that is meant to work without the GIL would know to turn it off.

dataflow · 2 years ago

I imagine the way to do this is to start Python with some flag saying that it's in no-GIL mode. That way it's up to the user to decide if their libraries can handle it.

fulafel · 2 years ago

Good points. Re analyzability - it wouldn't have to be static analysis to be useful, you could do this with dynamic analysis.

Deleted Comment