Overhead of Python asyncio tasks

Async Python is still confusing af - when do I need it, what happens under the hood, does it actually help with performance, sometimes the GIL comes into play and sometimes it doesn't, why do we ever use threads at all if there's a GIL, why is it called asyncio if we can use it for anything. My mind is kind of scattered and people seems to be using a lot of async Python for some reason.

Any good resources to clear things up?

uniqueuid · 3 years ago

I agree that especially within the standard context of python and its syntax, async seems weird (because it's sprinkled into an existing paradigm).

The best mental model for me always was to think: Here's an await, that means "interpreter, go and do something else that's currently waiting while this is not done yet."

And that's all about IO, because what you can wait on is essentially IO.

By the way, I really wish there was a better story to executors in async python. To think we still have the same queue/pickle-based multiprocessing to send async tasks to another core is kind of sad. Hoping for 3.12 and beyond there.

[edit] one really neat example that helped me get asyncio was Guido van Rossum's crawler in 500 lines of python [1]. A lot of the syntax is deprecated now, but it's still a great walk-through

[1] http://aosabook.org/en/500L/a-web-crawler-with-asyncio-corou...

philote · 3 years ago

It's great for when you can do concurrent I/O tasks. For example running a web backend, web scraping, or many API calls. FastAPI (and Starlette that it's built off of) is an async web framework and in my experience performs well.

Basically, your program will normally halt when doing I/O, and won't proceed until that I/O is done. During that halt, your program is doing nothing (no cpu being used). With asyncio, you can schedule multiple tasks to run, so if one is halted doing I/O, another can run.

Edit: And AFAIK, the GIL does not come into play at all with async. Only when multithreading.

morelisp · 3 years ago

> the GIL does not come into play at all with async.

In the sense that the GIL is still held and you can have at most one path of Python code executing at a time regardless of whether you use async or threads, sure. Most blocking I/O was already releasing the GIL so the difference is purely in how you can design your modules; for any reasoning about performance the GIL behaves the same way whether you use asyncio or not.

323 · 3 years ago

It makes concurrent programming much simpler than using threads.

Very few locking and care is needed with asyncio, as opposed to using threads. Race conditions are basically not a thing if you write reasonably idiomatic code.

It might be (or not) faster than using threads, but that's not the main benefit in my view - this easiness of use is.

scrlk · 3 years ago

I've found SuperFastPython to be helpful for understanding Python concurrency: https://superfastpython.com/python-concurrency-choose-api/

bombolo · 3 years ago

It's basically a fancy way to do epoll() around file descriptors, but hide the need to keep a main loop and state, and keep it hidden as functions that run in fake concurrency, stopping whenever they block, and being executed again when their file descriptor has activity.

It doesn't necessarily improve performance. It's just a much easier way to do non-blocking I/O (note that blocking and threaded I/O is easier to do but much much heavier).

loa_in_ · 3 years ago

In personal experience, when you write anything more complicated using threads in python, I end up writing lots of boilerplate code that I wished I could just shove somewhere - Events, 1 element Queues, ContextVar contexts, all of which don't benefit for being named and make appearance only in two spots in code. Async removed a lot of this for me while also unifying the future ecosystem and making my code more composable and easy to integrate.

franga2000 · 3 years ago

When working in a fully async context, it becomes very logical and natural. A great example of this is the FastAPI web framework. You never write any top-level code, only functions that the framework calls, so you never have to deal with the event loop directly. You basically just sprinkle some async and await keywords around your IO bottlenecks and things suddenly run smoother.

x-complexity · 3 years ago

> Async Python is still confusing af - when do I need it,

It's needed when you're spending a lot of time waiting for an I/O request to complete (network/HTTP requests, disk reads/writes, database reads/writes)

> what happens under the hood,

https://tenthousandmeters.com/blog/python-behind-the-scenes-...

(Please read the entire blog post - It goes through the necessary concepts like generators, event loops, & coroutines)

> does it actually help with performance,

Refer to the first answer: You'll see improved performances if your workloads are mainly comprised of waiting for other stuff to complete. If you're compute-heavy, it'll be better to use the 'multiprocessing' library instead.

> sometimes the GIL comes into play and sometimes it doesn't,

The GIL comes into play when you have a lot of compute-heavy tasks: Otherwise, you'll rarely encounter it.

It's only when you have that many compute-heavy tasks that you start to use the 'multiprocessing' library.

> why do we ever use threads at all if there's a GIL,

Threads exist because it was there before asyncio & event loops came into Python.

> why is it called asyncio if we can use it for anything.

Its name came from PEP 3156, proposing the asyncio library back in 2012.

https://peps.python.org/pep-3156/

As for why, asynchronous I/O stands in contrast to synchronous I/O, where the program/thread had to wait for the I/O request to complete before it can do anything else. Making tasks asynchronous allows it to do other stuff while it waits for a task's request to complete, increasing CPU & I/O utilization.

> My mind is kind of scattered and people seems to be using a lot of async Python for some reason. Any good resources to clear things up?

Highly recommend this video from mcoding: It's fairly simple & goes through a sample implementation.

https://www.youtube.com/watch?v=ftmdDlwMwwQ

Also, this article:

https://realpython.com/async-io-python/

ActorNightly · 3 years ago

Async looks like parallelism, but its just smart scheduling. The core concept of async is saying "hey, im waiting for something to complete thats not under my control (like waiting for data on a socket to be able to be read), go ahead and do other things in the mean time".

If you ever coded in sockets in C (and its a good exercise to do so), you probably have at some point ran across `select` which is essentially a non blocking way to check which sockets have data available to read, and then sequentially read the data. This gives the ability for a program to appear parallelized in the sense that it can handle multiple client connections, but its not truly parallel. Different clients can be handled at different time depending on which order they connect, which is asynchronous in nature (versus processing each client in sequence and waiting on each one to connect and disconnect before moving on to the next one)

Async in Python is basically this concept, with a core fundamental feature of time limited execution. Functions can say that they are pausing for x seconds, allowing other functions to run, or functions can say that they give a certain function x seconds to run before resuming execution. If you async code (along with any library you may use) doesn't contain any sleeps or timeouts, its exactly equivalent to synchronous code (since the event loop never really recieves a message that it can suspend a routine or cancel it). With sleeps and timeouts, you gain control over things that can potentially block, both from a caller perspective of not having a function call block your own, and from a callee perspective of not making your function blocking.

The use case is for it is that it is good for I/O bound operations like Threading is, but with the addition that you don't have to worry about synchronization or race conditions, since by design your code will have predictable access patterns. The downside is that your code and any libraries that you use within your code has to be implemented as async libraries, and any library that is async has to have async wrappers around the calls to its methods, which in turn means that your entire code has to be async.

Threading with Python is generally not useful, as its not true parallelism because of GIL. GIL allows only one thread in Python to run. Threading is safer in Python because of this, however it obviously has drawbacks. In general its best used if you want asyncio like performance with a library that is not written with async, since GIL is smart enough to detect when a thread is waiting for input and switch context.

True parallelism in Python is achieved with multiprocessing, however the use case is a little different. Rather than spinning off processes, you generally launch a bunch of worker processes up front (to avoid the larger overhead), then use smart scheduling to distribute work between these processes. Here though you do have to worry about race conditions and synchronization, and use things like locks and mutexes.

100,000 tasks 77,108 tasks per/s 200,000 tasks 69,945 tasks per/s 300,000 tasks 72,453 tasks per/s 400,000 tasks 74,636 tasks per/s 500,000 tasks 66,253 tasks per/s 600,000 tasks 77,576 tasks per/s 700,000 tasks 69,673 tasks per/s 800,000 tasks 68,176 tasks per/s 900,000 tasks 73,846 tasks per/s 1,000,000 tasks 68,013 tasks per/s

100000 Tasks 523000 Tasks/s 200000 Tasks 550000 Tasks/s 300000 Tasks 550000 Tasks/s 400000 Tasks 559000 Tasks/s 500000 Tasks 547000 Tasks/s 600000 Tasks 539000 Tasks/s 700000 Tasks 547000 Tasks/s 800000 Tasks 540000 Tasks/s 900000 Tasks 560000 Tasks/s 1000000 Tasks 542000 Tasks/s

would you mind sharing the go and python results running on your machine too? It is apples to orange comparation otherwise.

EDIT. My results on a 5950x (undervolted)

python3.8.exe test.py

100,000 tasks 139,130 tasks per/s

200,000 tasks 121,905 tasks per/s

300,000 tasks 120,000 tasks per/s

400,000 tasks 114,286 tasks per/s

500,000 tasks 119,403 tasks per/s

600,000 tasks 117,073 tasks per/s

700,000 tasks 130,612 tasks per/s

800,000 tasks 122,488 tasks per/s

900,000 tasks 120,000 tasks per/s

1,000,000 tasks 110,155 tasks per/s

python3.11.exe .\test.py

100,000 tasks 206,452 tasks per/s

200,000 tasks 185,507 tasks per/s

300,000 tasks 186,408 tasks per/s

400,000 tasks 179,021 tasks per/s

500,000 tasks 167,539 tasks per/s

600,000 tasks 177,778 tasks per/s

700,000 tasks 188,235 tasks per/s

800,000 tasks 180,919 tasks per/s

900,000 tasks 168,421 tasks per/s

.\test.exe (go 1.20 compiled)

100000 tasks 2710563.336378 tasks per/s

200000 tasks 3076885.207567 tasks per/s

300000 tasks 3332292.917434 tasks per/s

400000 tasks 3040479.422795 tasks per/s

500000 tasks 2810232.844653 tasks per/s

600000 tasks 3004138.200371 tasks per/s

700000 tasks 2738877.029117 tasks per/s

800000 tasks 2893730.985022 tasks per/s

900000 tasks 3043877.494077 tasks per/s

1000000 tasks 2857992.089078 tasks per/s

async function time_tasks(count=100) { async function nop_task() { return performance.now(); } const start = performance.now() let tasks = Array(count).map(nop_task) await Promise.all(tasks) const elapsed = performance.now() - start return elapsed / 1e3 } for (let count = 100000; count < 1000000 + 1; count += 100000) { const ct = await time_tasks(count) console.log(`${count}: ${1 / (ct / count)} tasks/sec`) }

% bun textual.ts 100000: 3767797.000743159 tasks/sec 200000: 9001406.4697609 tasks/sec 300000: 8281002.001242148 tasks/sec 400000: 10038491.340232708 tasks/sec 500000: 8976653.913474608 tasks/sec 600000: 10437550.828698047 tasks/sec 700000: 9443895.154523576 tasks/sec 800000: 11021991.118011119 tasks/sec 900000: 9790550.215324111 tasks/sec 1000000: 10263937.143648934 tasks/sec % python3 textual.py 100,000 tasks 303,063 tasks per/s 200,000 tasks 270,058 tasks per/s 300,000 tasks 271,621 tasks per/s 400,000 tasks 261,945 tasks per/s 500,000 tasks 251,070 tasks per/s 600,000 tasks 272,520 tasks per/s 700,000 tasks 250,977 tasks per/s 800,000 tasks 253,131 tasks per/s 900,000 tasks 244,696 tasks per/s 1,000,000 tasks 266,061 tasks per/s

$ deno run tasks.js 100000: 2777777.777777778 tasks/sec 200000: 3225806.4516129033 tasks/sec ... 800000: 2395209.580838323 tasks/sec 900000: 1679104.4776119404 tasks/sec 1000000: 1851851.8518518517 tasks/sec

$ python3.11 Synchronous 100,000 tasks 22,716,947 tasks per/s 200,000 tasks 22,706,630 tasks per/s 300,000 tasks 22,742,779 tasks per/s 400,000 tasks 22,614,202 tasks per/s 500,000 tasks 22,760,379 tasks per/s 600,000 tasks 22,799,818 tasks per/s 700,000 tasks 22,842,971 tasks per/s 800,000 tasks 22,778,395 tasks per/s 900,000 tasks 22,854,241 tasks per/s 1,000,000 tasks 22,470,395 tasks per/s await 100,000 tasks 10,336,986 tasks per/s 200,000 tasks 10,405,286 tasks per/s 300,000 tasks 10,451,505 tasks per/s 400,000 tasks 10,482,455 tasks per/s 500,000 tasks 10,451,287 tasks per/s 600,000 tasks 10,485,478 tasks per/s 700,000 tasks 10,508,302 tasks per/s 800,000 tasks 10,505,167 tasks per/s 900,000 tasks 10,492,568 tasks per/s 1,000,000 tasks 10,457,516 tasks per/s asyncio.create_task() 100,000 tasks 219,858 tasks per/s 200,000 tasks 196,281 tasks per/s 300,000 tasks 201,530 tasks per/s 400,000 tasks 193,674 tasks per/s 500,000 tasks 187,611 tasks per/s 600,000 tasks 201,972 tasks per/s 700,000 tasks 187,505 tasks per/s 800,000 tasks 191,531 tasks per/s 900,000 tasks 198,127 tasks per/s 1,000,000 tasks 173,259 tasks per/s asyncio.gather() 100,000 tasks 291,095 tasks per/s 200,000 tasks 193,324 tasks per/s 300,000 tasks 129,177 tasks per/s 400,000 tasks 107,024 tasks per/s 500,000 tasks 123,023 tasks per/s 600,000 tasks 122,304 tasks per/s 700,000 tasks 121,674 tasks per/s 800,000 tasks 106,530 tasks per/s 900,000 tasks 135,841 tasks per/s 1,000,000 tasks 106,153 tasks per/s asyncio.TaskGroup.create_task() 100,000 tasks 319,629 tasks per/s 200,000 tasks 283,560 tasks per/s 300,000 tasks 204,328 tasks per/s 400,000 tasks 203,584 tasks per/s 500,000 tasks 200,968 tasks per/s 600,000 tasks 214,506 tasks per/s 700,000 tasks 206,512 tasks per/s 800,000 tasks 204,556 tasks per/s 900,000 tasks 210,298 tasks per/s 1,000,000 tasks 202,523 tasks per/s