It’s an old wives tale that AWS came out of “excess capacity” from Amazon Retail.
Source: Ex-AMZN
The latter course (a) was built on a mathematical formalism that had been developed at the university proper and not used anywhere else, (b) used PVM: <https://www.netlib.org/pvm3/>, <https://en.wikipedia.org/wiki/Parallel_Virtual_Machine>, for labs.
Since then, I've repeatedly felt that I've seriously benefited from my formal languages courses, while the same couldn't be said about my parallel programming studies. PVM is dead technology (I think it must have counted as "nearly dead" right when we were using it). And the only aspect I recall about the formal parallel stuff is that it resembles nothing that I've read or seen about distributed and/or concurrent programming ever since.
A funny old memory regarding PVM. (This was a time when we used landlines with 56 kbit/s modems and pppd to dial in to university servers.) I bought a cheap second computer just so I could actually "distribute" PVM over a "cluster". For connecting both machines, I used linux's PLIP implementation. I didn't have money for two ethernet cards. IIRC, PLIP allowed for 40 kbyte/s transfers! <https://en.wikipedia.org/wiki/Parallel_Line_Internet_Protoco...>
- Consistency models (can I really count on data being there? What do I have to do to make sure that stale reads/write conflicts don't occur?)
- Transactions (this has really fallen off, especially in larger companies outside of BI/Analytics)
- Causality (how can I handle write conflicts at the App Layer? Are there Data Structures ie CDTs that can help in certain cases?)
Even basic things like "use system time/monotonic clocks to measure elapsed time instead of wall-clock time" aren't well known, I've personally corrected dozens of CRs for this. Yes this can be built in to libs, AI agents etc but it never seems to actually be, and I see the same issues repeated over-and-over. So something is missing at the education layer
Some time has passed since then — and yet, most people still develop software using sequential programming models, thinking about concurrency occasionally.
It is a durable paradigm. There has been no revolution of the sort that the author of this post yearns for. If "Distributed Systems Programming Has Stalled", it stalled a long time ago, and perhaps for good reasons.
- MIT course with Robert Morris (of Morris Worm fame): https://www.youtube.com/watch?v=cQP8WApzIQQ&list=PLrw6a1wE39...
- Martin Kleppmann (author of DDIA): https://www.youtube.com/watch?v=UEAMfLPZZhE&list=PLeKd45zvjc...
If you can work through the above (and DDIA), you'll have a solid understanding of the issues in Distributed System, like Consensus, Causality, Split Brain, etc. You'll also gain a critical eye of Cloud Services and be able to articulate their drawbacks (ex: did you know that replication to DynamoDB Secondary Indexes is eventually consistent? What effects can that have on your applications?)
Simply, the mad crushing dash to get the last bit of committed inventory.
Ticketmaster has 50,000 General Admission Taylor Swift tickets and 1M fans eager to hoover them up.
This is a crushing load on a shared resource.
I don't know if there's any reasonable outcome from this besides the data center not catching on fire.
A good async setup can easily handle 100k+ TPS
If you want to go the synchronous route, it's more complicated but amounts to partitioning and creating separate swim-lanes (copies of the system, both at the compute and data layers)
Today it just seems odd that anybody is still using MySQL. Postgres? Sure. SQLlite? Hell yeah! DuckDB? Of course. MySQL? Not so much.
- Single Write Leader per partition
- Backup Write Leader that is setup with synchronous replication (so WL -> WLB and waits for commit)
- Read Followers all connected asynchronously using either binlog replication (not recommended anymore) or GTID-based row replication (recommended)
In the above scenario, the odds of loss are pretty small since the Write Leader has a direct backup, and any of the Read Followers can be promoted to a Write Leader/Backup. DDIA calls the above semi-synchronous replication, although MySQL now supports a similar-but-slightly different version out of the box: https://dev.mysql.com/doc/refman/8.4/en/replication-semisync...