Beazley's Concurrency From the Ground Up is one of my favorite tech talks ever: In about 45 minutes he builds an async framework using generators, while live coding in an emacs screen that shows only about 20 lines and without syntax highlighting, and not breaking stride with his commentary and engaging with the audience.
Exactly what I thought while watching that video, it's as if he's spitting out the characters as he speaks.:
"A fantastic, entertaining and highly educational talk. It always bothers me that I can't play the piano and talk at the same time (my wife usually asks me things while I'm playing). But David can even type concurrent Python code in Emacs in Allegro vivace speed and talk about it at the same time. An expert in concurrency in every sense of the word. How enviable!"
There's also the one where he live codes a Webassembly interpretor. But my favourite is his talk on lambda calculus. It's incredibly fun to follow through.
It is entertaining and intelligent, but you won't learn Python from it and you won't get anywhere near a production ready implementation, since it glosses over all the hard parts.
Since we are sharing resources Fluent Python is my favorite reference on Python. It covers so many advanced features like concurrency, functools etc. It’s not the kind of book you read cover to cover it’s one that you go to as you need it, when I was working on python stuff I would read it once a month.
My favorite introductory book (not an introduction to programming but an introduction to the language) is “Introducing Python by Lubanovic” because it’s one of the only beginner books that actually covers the python module system with enough depth and the second half of the book gives a quick overview of a lot of different python libraries.
Am I the only one who is not particularly impressed by any of these links? Maybe I should see it as exemplatory examples, but they would not make it through a Code Review.
-> Currently serving as Application Architect for a medium sized Python application.
You shouldn't. Author seems pretty tongue-in-cheek about it:
> lambda has the benefit of making the code compact and foreboding. Plus, it prevents people from trying to add meaningful names, documentation or type-hints to the thing that is about to unfold.
David Beazley will forever have my respect for his talk where he uses Python to untangle 1.5T of C++ code on an airgapped computer, as an expert witness in a court case:
I taught this course to corporate clients for three or four years before developing my own materials.
The course materials for this course and the introductory course (“Practical Python”[1]) are quite thorough, but I've always found the portfolio analysis example very hokey.
There's enormous, accessible depth to these kinds of P&L reporting examples, but the course evolves this example in a much less interesting direction. Additionally, while the conceptual and theoretical materials is solid, the analytical and technical approach that the portfolio example takes quickly diverges from how we would actually solve a problem like this. (These days, attendees are very likely to have already been exposed to tools like pandas!) This requires additional instructor guidance to bridge the gap, to reconcile the pure Python and “PyData” approaches. (Of course, no other Python materials or Python instruction properly address and reconcile these two universes, and most Python materials that cover the “PyData” universe—especially those about pandas—are rife with foundational conceptual errors.)
Overall, David is an exceptional instructor, and his explanations and his written materials are top notch. He is one of the most thoughtful, most intelligent, and most engaging instructors I have ever worked with.
I understand from David that he rarely teaches this course or Practical Python to corporate audience, instead preferring to teach courses direct to the public. (In fact, I took over a few of his active corporate clients when he transitioned away from this work, which is what led me to drafting my own curricula.) I'm not sure if he still teaches this course at all anymore.
However, I would strongly encourage folks to look into his new courses, which cover a much broader set of topics (and are not Python-specific)! [2]
Also, if you do happen to be a Python programmer, be sure to check out his most recent book,“Python Distilled”[3]!
Something I hate in my own code is this pattern of instantiating an empty list and then iterating on it when reading files. Is there a better way than starting lst= [] and then later doing lst.append()
It's not better than a generator, but I'm surprised nobody has mentioned the very terse and still mostly readable
header, *records = [row.strip().split(',') for row in open(filename).readlines()]
but then you need a way to parse the records, which could be Template() from the string library or something like...
type_record = lambda r : (r[0], int(r[1]), float(r[2]))
At this point, the two no longer mesh well, unless you would be able to unpack into a function/generator/lambda rather than into a variable. (I don't know but my naive attempts and quick SO search were unfruitful.) Also, you're potentially giving up benefits of the CSV reader. Plus, as others have clarified, brevity does not equal readability or relative lack of bugs:
In the course example, it's reasonably easy to add some try blocks/error handling/default values while assigning records, giving you the chance to salvage valid rows without affecting speed or readability. In fact, error handling would be a necessity if that CSV file is externally accessible. Contrast that with my two lines, where there's not an elegant way to handle a bad row or escaped comma or missing file or virtually any other surprise.
Anything else I can think of off-hand (defaultdict, UserList, "if not portfolio:") has the same initialization step, endures some performance degradation, is more fragile, and/or is needlessly unreadable, like this lump of coal:
portfolio = [record] if 'portfolio' not in globals() else portfolio + [record]
So... your technique and generators. Those are safe-ish, readable, relatively concise, etc.
> It's not better than a generator, but I'm surprised nobody has mentioned the very terse and still mostly readable
> header, *records = [row.strip().split(',') for row in open(filename).readlines()]
Better would be:
header, *records = [row.strip().split(',') for row in open(filename)]
No need to read the lines all into memory first.
Edit: Also if you want to be explicit with the file closing, you could do something like:
with open(filename) as infile:
header, *records = [row.strip().split(',') for row in infile]
That is if we wanted to protect against future changes to semantics for garbage collection/reference counting. I always do this, but I kind of doubt it will ever really matter in any code I write.
def read_portfolio(filename):
record = lambda r: {
'name': r[0],
'shares': int(r[1]),
'price': float(r[2]),
}
with open(filename) as f:
rows = csv.reader(f)
headers = next(rows)
return [record(r) for r in rows]
You could use a list comprehension, but that can be unclear and hard to extend, depending on the situation. It can be a nice option if most of the parts in the generator can be broken out into functions with their own name, though.
You could turn it into a generator, which can cause some fun bugs (e.g. everything works fine when you first iterate over it, but not afterwards), so IMO that's best used when it needs to be a generator, for semantics or performance.
You could turn it into a generator, then add a wrapper that turns it into a list (keeping the inner function private), or use a decorator that does the same, but it's less clear than this pattern.
Yeah I think using a list comprehension is overkill. The main reason I like list comprehensions is because I don't introduce variables (even temporarily) that I don't really need. I think that clarifies the code. But putting the code in a separate function also avoids introducing those variables to the current scope only at a cost of putting the code somewhere else (which I personally think has a cost). In this case I would just use a function or (probably) just inline it as you don't like.
So `portfolio = [{'name': row[0], 'shares': int(row[1]), 'price': float(row[2]) for row in rows]`
But if it's more complicated than this (like if there is conditional(s) inside the loop), I'd recommend just stick with the current approach. It's possible to have even multiple conditionals in list comprehension, but it's not really very readable. If you do want to, walrus operator can make things better
(something like `numbers = [m[1] for s in array if (m := re.search(r'^.*(\d+).*$', s))]`)
They can be more readable than that at least, e.g.:
keys = "name", "shares", "price"
portfolio = [
dict(zip(keys, row))
for row in rows
]
If I had to do more complex stuff than building a dict like this I'd move it into a function. That tends to make the purpose more clear anyway.
That said, it's fine to append to a list too, I just prefer comprehensions when they fit the job. In particular, if you're just going to iterate once over this list anyway, you can turn it into a iterator comprehension by replacing [] by () and save some memory.
You could “yield” the record instead of constructing the list. This makes “read_portfolio” into an iterator instead of returning a list. Use a list comprehension or list constructor to convert the iterator to a list if needed.
I have also encountered this quite often. I'll say the ideal solution would be "postfix streaming methods" like `.filter` and `.map`. Unfortunately, Python doesn't have those (prefix `filter`s and `map`s are not even close), and you have comprehension expressions at best. To make things worse, complex comprehensions can also create confusion, so for your particular example I'll probably say it's acceptable. It could be better if you use unpacking instead of indexing though, as others have pointed out.
Another good (and entertaining) resource is James Powell's talk "So you want to be a Python expert" [1], the best explanation I've seen of decorators, generators and context managers. Good intro to the Python data (object) model too.
I saw his talk live in 2014 and the dude is amazing. I loved his summary of building Python libraries from the ground up during legal discovery because he discovered a hidden Python installation on the terminal his opponents gave him that allowed him to parse thousands of documents very quickly.
This is very cool, good for Beazley for making this freely available. I really should take the time to work through this material. For 40 years I have been a “Lisp guy”, slightly looking down on other languages I sometimes used at work like C++, Java, etc.
However, because of available ML/DL/LLM frameworks and libraries in Python, Python has been my go to language for years now. BTW, I love the other comment here that Beazley is the Jimi Hendrix of Python. Only those of us who enjoyed hearing Hendrix live really can get this.
David wrote https://www.dabeaz.com/generators/ which remains one of my all-time favourite Python tutorials. Looking forward to digging into this.
It's 8 years old, but definitely worth a watch: https://www.youtube.com/watch?v=MCs5OvhV9S4
"A fantastic, entertaining and highly educational talk. It always bothers me that I can't play the piano and talk at the same time (my wife usually asks me things while I'm playing). But David can even type concurrent Python code in Emacs in Allegro vivace speed and talk about it at the same time. An expert in concurrency in every sense of the word. How enviable!"
My favorite introductory book (not an introduction to programming but an introduction to the language) is “Introducing Python by Lubanovic” because it’s one of the only beginner books that actually covers the python module system with enough depth and the second half of the book gives a quick overview of a lot of different python libraries.
1. Test-Driven Development with Python
2. Architecture Patterns with Python
The 2nd one is the closest you're gonna get to a production-grade tutorial book.
Related to this topic, these resources by @dbeazley:
Barely an Interface
https://github.com/dabeaz/blog/blob/main/2021/barely-interfa...
Now You Have Three Problems
https://github.com/dabeaz/blog/blob/main/2023/three-problems...
A Different Refactoring
https://github.com/dabeaz/blog/blob/main/2023/different-refa...
His youtube channel:
https://youtube.com/@dabeazllc
-> Currently serving as Application Architect for a medium sized Python application.
> lambda has the benefit of making the code compact and foreboding. Plus, it prevents people from trying to add meaningful names, documentation or type-hints to the thing that is about to unfold.
Disclaimer, I did not read the entire post.
https://youtu.be/RZ4Sn-Y7AP8
It's 47 minutes and totally worth it.
The course materials for this course and the introductory course (“Practical Python”[1]) are quite thorough, but I've always found the portfolio analysis example very hokey.
There's enormous, accessible depth to these kinds of P&L reporting examples, but the course evolves this example in a much less interesting direction. Additionally, while the conceptual and theoretical materials is solid, the analytical and technical approach that the portfolio example takes quickly diverges from how we would actually solve a problem like this. (These days, attendees are very likely to have already been exposed to tools like pandas!) This requires additional instructor guidance to bridge the gap, to reconcile the pure Python and “PyData” approaches. (Of course, no other Python materials or Python instruction properly address and reconcile these two universes, and most Python materials that cover the “PyData” universe—especially those about pandas—are rife with foundational conceptual errors.)
Overall, David is an exceptional instructor, and his explanations and his written materials are top notch. He is one of the most thoughtful, most intelligent, and most engaging instructors I have ever worked with.
I understand from David that he rarely teaches this course or Practical Python to corporate audience, instead preferring to teach courses direct to the public. (In fact, I took over a few of his active corporate clients when he transitioned away from this work, which is what led me to drafting my own curricula.) I'm not sure if he still teaches this course at all anymore.
However, I would strongly encourage folks to look into his new courses, which cover a much broader set of topics (and are not Python-specific)! [2]
Also, if you do happen to be a Python programmer, be sure to check out his most recent book,“Python Distilled”[3]!
[1] https://dabeaz-course.github.io/practical-python/
[2] https://www.dabeaz.com/courses.html
[3] https://www.amazon.com/Python-Essential-Reference-Developers...
Well, unfortunately 5-day courses listed there are $1500 each.
If the free of charge course discussed here is really that good, it is a nice promo to go and pay for another. Ed Tech Lo-Fi style.
This is an example from the linked course https://github.com/dabeaz-course/python-mastery/blob/main/Ex...:
``` # readport.py
import csv
# A function that reads a file into a list of dicts
def read_portfolio(filename):
```In the course example, it's reasonably easy to add some try blocks/error handling/default values while assigning records, giving you the chance to salvage valid rows without affecting speed or readability. In fact, error handling would be a necessity if that CSV file is externally accessible. Contrast that with my two lines, where there's not an elegant way to handle a bad row or escaped comma or missing file or virtually any other surprise.
Anything else I can think of off-hand (defaultdict, UserList, "if not portfolio:") has the same initialization step, endures some performance degradation, is more fragile, and/or is needlessly unreadable, like this lump of coal:
So... your technique and generators. Those are safe-ish, readable, relatively concise, etc.> header, *records = [row.strip().split(',') for row in open(filename).readlines()]
Better would be:
No need to read the lines all into memory first.Edit: Also if you want to be explicit with the file closing, you could do something like:
That is if we wanted to protect against future changes to semantics for garbage collection/reference counting. I always do this, but I kind of doubt it will ever really matter in any code I write.That will read the file as needed (ie as you iterate over it) instead of loading the entire thing in memory.
You could use a list comprehension, but that can be unclear and hard to extend, depending on the situation. It can be a nice option if most of the parts in the generator can be broken out into functions with their own name, though.
You could turn it into a generator, which can cause some fun bugs (e.g. everything works fine when you first iterate over it, but not afterwards), so IMO that's best used when it needs to be a generator, for semantics or performance.
You could turn it into a generator, then add a wrapper that turns it into a list (keeping the inner function private), or use a decorator that does the same, but it's less clear than this pattern.
So, i'd just learn to live with it.
So `portfolio = [{'name': row[0], 'shares': int(row[1]), 'price': float(row[2]) for row in rows]`
But if it's more complicated than this (like if there is conditional(s) inside the loop), I'd recommend just stick with the current approach. It's possible to have even multiple conditionals in list comprehension, but it's not really very readable. If you do want to, walrus operator can make things better
(something like `numbers = [m[1] for s in array if (m := re.search(r'^.*(\d+).*$', s))]`)
That said, it's fine to append to a list too, I just prefer comprehensions when they fit the job. In particular, if you're just going to iterate once over this list anyway, you can turn it into a iterator comprehension by replacing [] by () and save some memory.
``` There should be one-- and preferably only one --obvious way to do it. ```
Deleted Comment
Personally the way you've done it is the most Pythonic IMO. List comprehensions are great but would be less readable in this case.
Honestly, this is the approach I've been using even though I hate it. Specially if your code is going to be read by anyone other than you.
Deleted Comment
[1] https://youtu.be/cKPlPJyQrt4
https://youtube.com/watch?v=RZ4Sn-Y7AP8
However, because of available ML/DL/LLM frameworks and libraries in Python, Python has been my go to language for years now. BTW, I love the other comment here that Beazley is the Jimi Hendrix of Python. Only those of us who enjoyed hearing Hendrix live really can get this.