Readit News logoReadit News
henrydark commented on Hashed sorting is typically faster than hash tables   reiner.org/hashed-sorting... · Posted by u/Bogdanp
gregorygoc · 6 months ago
> For small values of N, log(N) is essentially a constant, <= 32, so we can just disregard it, making sorting simply O(N). For large values, even so-called linear algorithms (e.g. linear search) are actually O(N log(N)), as the storage requirements for a single element grow with log(N) (i.e. to store distinct N=2^32 elements, you need N log(N) = 2^32 * 32 bits, but to store N=2^64 elements, you need 2^64 * 64 bits).

How can I unread this?

henrydark · 6 months ago
I've given the following exercise to developers in a few workplaces:

What's the complexity of computing the nth fibonacci number? Make a graph of computation time with n=1..300 that visualizes your answer.

There are those that very quickly reply linear but admit they can't get a graph to corroborate, and there are those that very quickly say linear and even produce the graph! (though not correct fibonacci numbers...)

henrydark commented on Left to Right Programming   graic.net/p/left-to-right... · Posted by u/graic
henrydark · 7 months ago
words_on_lines = [ret for line in text.splitlines() for ret in [line.split()]]
henrydark commented on Opsqueue: Lightweight batch processing queue for heavy loads – now open-source   channable.com/tech/introd... · Posted by u/qqwy
henrydark · 7 months ago
Dask for python satisfies exactly these requirements, just in the python ecosystem. A pattern I have been using at multiple workplaces for the last decade was to start a dask cluster (maybe 10 years ago I would start an ipyparallel cluster) on any node that does computation, then as I needed I would spin new nodes and connect them. This gives dynamic infinite scalability, with almost no overhead or even code debt - the dask interfaces are great even without using distributed computing. When I wasn't allowed to use containers, I would sneakily add code to other machines to join my dask clusters. I would connect any and all computing devices. One company pushed us to use databricks and spark, and I never got it - why would we commit to a cluster size before we started a computation?
henrydark commented on Researchers accurately dating a 7k-year-old settlement using cosmic rays   phys.org/news/2024-05-suc... · Posted by u/wglb
gaoryrt · 2 years ago
Sorry but what's the difference between this and C14[1] dating?

[1]: https://en.wikipedia.org/wiki/Radiocarbon_dating

henrydark · 2 years ago
The innovation is to find traces of a global cosmic-ray event with which to connect the dating of objects in one local area, Greece, where the dendrochronological data is not continuous, with those in far away local areas, for example England/Ireland, where we have continuous dendrochronological data
henrydark commented on PRQL as a DuckDB Extension   github.com/ywelsch/duckdb... · Posted by u/tosh
klysm · 2 years ago
it sounds minor, but having `from` before `select` means you can get autocomplete
henrydark · 2 years ago
I get the sentiment, but personally I can easily imagine myself writing an autocompleter that would work fine with select before from. (I don't write much sql so I don't)

Just to clarify, my point is that when we do write sql most of us start by writing the from part, and even if we didn't I can just offer all columns from all tables I know about with some heuristic for their order when autocompleting in the select part.

u/henrydark

KarmaCake day423October 8, 2021View Original