Readit News logoReadit News
BiteCode_dev · 6 years ago
Don't follow this blog post advice.

Manually dealing with threads and processes is useful if you want to build a framework or a very complex workflow. But chances are you just want to run stuff concurrently, in the background.

In that case (which is most people case), you really want to use one of the stdlib pools: it takes care of sync, serialization, communication with queues, worker life cycle, task distribution, etc. for you.

Plus in such case, waiting for results is super easy:

    import time
    import random

    from concurrent.futures import ThreadPoolExecutor, as_completed

    # the work to distribute
    def hello():
        seconds = random.randint(0, 5)
        print(f'Hi {seconds}s')
        time.sleep(seconds)
        print(f'Bye {seconds}s')
        return seconds

    # max concurrency is 2
    executor = ThreadPoolExecutor(max_workers=2)
   
    # submit the work
    a = executor.submit(hello)
    b = executor.submit(hello)

    # and here we wait for results
    for future in as_completed((a, b)):
        print(future.result())
Want multiple process instead ? It's the same API:

    import time
    import random

    from concurrent.futures import ProcessPoolExecutor, as_completed

    def hello():
        seconds = random.randint(0, 5)
        print(f'Hi {seconds}s')
        time.sleep(seconds)
        print(f'Bye {seconds}s')
        return seconds

    # Don't forget this for processes, or you'll get in trouble
    if __name__ == "__main__":

        executor = ProcessPoolExecutor(max_workers=2)

        a = executor.submit(hello)
        b = executor.submit(hello)

        for future in as_completed((a, b)):
            print(future.result())
This is Python. Don't make your life harder that it needs to be.

ZeroCool2u · 6 years ago
This example still involves a lot of manual work. It's often times even easier.

    from concurrent.futures import ProcessPoolExecutor
    import string
    
    def hello() -> int:
        seconds = random.randint(0, 5)
        print(f'Hi {seconds}s')
        time.sleep(seconds)
        print(f'Bye {seconds}s')
        return seconds
    
    # Don't forget this for processes, or you'll get in trouble
    if __name__ == "__main__":
    
    inputs = list(string.printable)
    results = []
    
    # You can sub out ProcessPool with ThreadPool. 
    with ProcessPoolExecutor() as executor:
        results += executor.map(hello, inputs)
    
    [print(s) for s in results]

BiteCode_dev · 6 years ago
"with" is always a good idea indeed.

But be careful, map() and submit() + as_completed() don't have the same effect.

The first one will give you the result in the insertion order, while the later give you the results in the order they are completed.

BSVogler · 6 years ago
The code needs some minor changes (imports, missing parameter and indent) to make it runnable.

  from concurrent.futures import ProcessPoolExecutor
  import string
  import random
  import time

  def hello(output) -> int:
    seconds = random.randint(0, 5)
    print(f'Hi {output} {seconds}s')
    time.sleep(seconds)
    print(f'Bye {seconds}s')
    return seconds

  # Don't forget this for processes, or you'll get in trouble
  if __name__ == "__main__":

    inputs = list(string.printable)
    results = []

    # You can sub out ProcessPool with ThreadPool. 
    with ProcessPoolExecutor() as executor:
        results += executor.map(hello, inputs)

    [print(s) for s in results]
EDIT: I also struggle with proper code indentation on HN.

Deleted Comment

vxNsr · 6 years ago
Quintessential HN, top comment totally disapproves the posted article. Thanks. I was just having an issue with this and was excited to see some gains with the link here and then to see there's an even better way is terrific!

If you wouldn't mind going into a little more detail about what you're doing I'd really appreciate it!

BiteCode_dev · 6 years ago
> If you wouldn't mind going into a little more detail about what you're doing I'd really appreciate it!

What do you want to know ?

jangid · 6 years ago
That is true. Python has better ways to deal with concurrency. As I wrote elsewhere in comments, I started reading asyncio. But found that for a newbie this article is good to grasp basic concepts.
Siecje · 6 years ago
What if you are waiting on something that is not your Python code?

What about this?

https://github.com/rwarren/SystemEvent

BiteCode_dev · 6 years ago
You put the call in a function where you wait for it, and pass it to the thread pool.

Or you use asyncio for network based stuff.

hiisukun · 6 years ago
Great example. Can I ask how I would discover or remember this import line, in case I forget?

I often remember useful standard libraries that I hear about by name, after using them once, but this is a long one.

Alternatively - is this an example in the python readthedocs yet? I try to fall back to that when possible, so my code stays simple for others.

BiteCode_dev · 6 years ago
I don't. I have a huge knowledge base I store on my computers with those kind of things. E.G: this snippet is actually almost verbatim a file I have on my laptop I wrote months ago and kept so I don't have to rewrite that again.
zer0faith · 6 years ago
This is solid advice.
amelius · 6 years ago
Or just create a Queue(), give it as a parameter to the long calculation so it can store the result in it, and in the main thread just do q.get(). There are probably a dozen other synchronization primitives you can use, but this one is very versatile and you only need to keep one API in your head. Also, this approach somewhat mimics the concept of channels in Go.
hangonhn · 6 years ago
This is among the best concurrent programming advices I've ever been given and have given out. Use a queue and a pool of workers. It makes your synchronization so much easier and simpler.
baq · 6 years ago
automatic progress bars for console programs in Python: https://tqdm.github.io/
pyjonista · 6 years ago
I often find myself overwhelmed thinking about concurrency/parallelism in Python: time.sleep, concurrent.futures, multiprocessing, threads, queues, asyncio, uvloop, tornado, twisted, events, coroutines, curio, trio, select, gevent, eventlet...

What is a good solution that the majority of us simple humans with limited time and resources should pursue? I want to write software that competes with similar solutions written in Go. Does that exist? Is there one solution that it's safe to adopt and generally recommended by the community? I used to think that asyncio was the solution to that but I was unable to fully understand its API. Please help!

rickycook · 6 years ago
i’d say that multiprocessing probably covers 90% of “do it in the background” tasks, and asyncio covers 90% of async networking tasks

futures, uvloop, tornado, twisted, coros, gevent, etc are kinda just all related to, or do similar to asyncio

threading is kinda not as useful as you might like in python because of the GIL (simplistically, assume that python can only do 1 thing at once regardless of having multiple threads available or not until you know why that’s not always the case)

pjc50 · 6 years ago
The general strategy of "use the OS threading primitives" like join and events works in most languages.
Akababa · 6 years ago
Nice, I learned about a couple new things. I take it these can all be implemented with the lock/semaphore primitives?
williamDafoe · 6 years ago
The idea of waiting with a progress indicator has a large bug, on an 8-bit or 16-bit machine you cannot read or write atomically from a progress variable. I guess the code works because of the interpreter lock that cripples python but it's very bad hygeine in all languages.
beering · 6 years ago
Can you elaborate 1) why you can't r/w atomatically from a progress variable (even if it's a byte?), 2) why are we writing Python code for 8-bit or 16-bit machines, and 3) why is it bad to take advantage of our language's guarantees on execution, e.g. with Python and JavaScript?
saagarjha · 6 years ago
Not every processor supports atomic memory accesses.
2rsf · 6 years ago
Good article on an important topic for novices, but why is it so long ?
PostOnce · 6 years ago
Because it helps you learn.

You can tell someone a brief answer to memorize, but to teach them anything takes a little more explaining.

For example, Question #1 is often "how do I do this in some particular way", but 2 and 3, "why" and "when" should I do it that particular way -- they take more explaining than #1 -- and through that explanation, you start to get good at your craft.

HeadInTheClouds · 6 years ago
Actually I also liked the way each iteration built on the previous concept. Having the full code at each step adds a lot to the length, but for beginners, this is really useful.
jangid · 6 years ago
Actually, I started reading about async/await from asyncio package. But for a newbie the above article is much simpler to grasp concurrency and parallelism concepts. And the sample code makes it a lot easier.
pintable · 6 years ago
Didn't seem that long to me? The examples are additive and easy to follow, too.