falconroar (u/falconroar)

falconroar commented on Why DuckDB is my first choice for data processing robinlinacre.com/recommen... · Posted by u/tosh

steve_adams_86 · a month ago

DuckDB has this capability as well: https://duckdb.org/docs/stable/guides/performance/how_to_tun...

falconroar · a month ago

Interesting, I wasn't aware; thanks for that. I will say, Polars' implementation is much more centered on out-of-core processing, and bypasses some of DuckDB's limitations ("DuckDB cannot yet offload some complex intermediate aggregate states to disk"). Both incredible pieces of software.

To expand on this, Polars' `LazyFrame` implementation allows for simple addition of new backends like GPU, streaming, and now distributed computing (though it's currently locked to a vendor). The DuckDB codebase just doesn't have this flexibility, though there are ways to get it to run on GPU using external software.

falconroar commented on Why DuckDB is my first choice for data processing robinlinacre.com/recommen... · Posted by u/tosh

noo_u · a month ago

I'd say the author's thoughts are valid for basic data processing. Outside of that, most of claims in this article, such as:

"We're moving towards a simpler world where most tabular data can be processed on a single large machine1 and the era of clusters is coming to an end for all but the largest datasets."

become very debatable. Depending on how you want to pivot/ scale/augment your data, even datasets that seemingly "fit" on large boxes will quickly OOM you.

The author also has another article where they claim that:

"SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable." (over polars/pandas etc)

This does not map to my experience at all, outside of the realm of nicely parsed datasets that don't require too much complicated analysis or augmentation.

falconroar · a month ago

Polars also has all of these benefits (to some degree), but also allows for larger-than-memory datasets. Also has GPU backend, distributed backend, etc. Polars is heavily underrated, even with the recent hype.

falconroar commented on Why DuckDB is my first choice for data processing robinlinacre.com/recommen... · Posted by u/tosh

anotherpaul · a month ago

What's the advantage over using Polars for the same task? It seems to me the natural competitor here and I vastly prefer the Polars syntax over SQL any day. So I was curious if I should try duckdb or stick with polars

falconroar · a month ago

Polars has all of the benefits of DuckDB (to some degree), but also allows for larger-than-memory datasets.

falconroar commented on Why DuckDB is my first choice for data processing robinlinacre.com/recommen... · Posted by u/tosh

falconroar · a month ago

Polars has all of these benefits (to some degree), but also allows for larger-than-memory datasets.

falconroar commented on Why DuckDB is my first choice for data processing robinlinacre.com/recommen... · Posted by u/tosh

mrtimo · a month ago

What I love about duckdb:

-- Support for .parquet, .json, .csv (note: Spotify listening history comes in a multiple .json files, something fun to play with).

-- Support for glob reading, like: select * from 'tsa20*.csv' - so you can read hundreds of files (any type of file!) as if they were one file.

-- if the files don't have the same schema, union_by_name is amazing.

-- The .csv parser is amazing. Auto assigns types well.

-- It's small! The Web Assembly version is 2mb! The CLI is 16mb.

-- Because it is small you can add duckdb directly to your product, like Malloy has done: https://www.malloydata.dev/ - I think of Malloy as a technical persons alternative to PowerBI and Tableau, but it uses a semantic model that helps AI write amazing queries on your data. Edit: Malloy makes SQL 10x easier to write because of its semantic nature. Malloy transpiles to SQL, like Typescript transpiles to Javascript.

falconroar · a month ago

Polars has all of these benefits (to some degree), but also allows for larger-than-memory datasets.

falconroar commented on How Israeli actions caused famine in Gaza, visualized cnn.com/2025/10/02/middle... · Posted by u/nashashmi

munk-a · 4 months ago

It is justifiable to stop a shipment of baby formula if that baby formula is known to be unsafe and carry bacteria that will kill infants.

I think in this particular case it's quite safe to say that those blocking the shipments aren't acting in good faith, however.

falconroar · 4 months ago

Does the food carry bacteria in reality? Why are we talking about bizarre hypotheticals?

falconroar commented on How Israeli actions caused famine in Gaza, visualized cnn.com/2025/10/02/middle... · Posted by u/nashashmi

neoromantique · 4 months ago

Now do Arab polls.

Seriously, you can't apply western standards to middle eastern coumtry, even more so, single out one ME country to apply them to.

falconroar · 4 months ago

Being against genocide isn’t a western standard.

falconroar commented on How Israeli actions caused famine in Gaza, visualized cnn.com/2025/10/02/middle... · Posted by u/nashashmi

gsinclair · 4 months ago

Preventing the entry of something that can be made into a weapon is justified, yes. If you want to call that “starving children”, that’s up to you.

falconroar · 4 months ago

Withholding food from children results in children starving. It’s not semantics.

falconroar commented on How Israeli actions caused famine in Gaza, visualized cnn.com/2025/10/02/middle... · Posted by u/nashashmi

Braxton1980 · 4 months ago

And Israel will still get away with it. There will be delays, complaints, but 10 years from now Israel will have Gaza and at best the Palestinians, reduced in number, will live in a small ghettos. At worst somewhere else or dead

falconroar · 4 months ago

They will get away with it if we believe we are powerless to change it. Russia has been proven to be pushing defeatist propaganda similar to your sentiment, and I'm sure Israel has been as well.