"We're moving towards a simpler world where most tabular data can be processed on a single large machine1 and the era of clusters is coming to an end for all but the largest datasets."
become very debatable. Depending on how you want to pivot/ scale/augment your data, even datasets that seemingly "fit" on large boxes will quickly OOM you.
The author also has another article where they claim that:
"SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable." (over polars/pandas etc)
This does not map to my experience at all, outside of the realm of nicely parsed datasets that don't require too much complicated analysis or augmentation.
To expand on this, Polars' `LazyFrame` implementation allows for simple addition of new backends like GPU, streaming, and now distributed computing (though it's currently locked to a vendor). The DuckDB codebase just doesn't have this flexibility, though there are ways to get it to run on GPU using external software.