henrydark (u/henrydark)

henrydark commented on Hashed sorting is typically faster than hash tables reiner.org/hashed-sorting... · Posted by u/Bogdanp

gregorygoc · 6 months ago

> For small values of N, log(N) is essentially a constant, <= 32, so we can just disregard it, making sorting simply O(N). For large values, even so-called linear algorithms (e.g. linear search) are actually O(N log(N)), as the storage requirements for a single element grow with log(N) (i.e. to store distinct N=2^32 elements, you need N log(N) = 2^32 * 32 bits, but to store N=2^64 elements, you need 2^64 * 64 bits).

How can I unread this?

henrydark · 6 months ago

I've given the following exercise to developers in a few workplaces:

What's the complexity of computing the nth fibonacci number? Make a graph of computation time with n=1..300 that visualizes your answer.

There are those that very quickly reply linear but admit they can't get a graph to corroborate, and there are those that very quickly say linear and even produce the graph! (though not correct fibonacci numbers...)

henrydark commented on Left to Right Programming graic.net/p/left-to-right... · Posted by u/graic

henrydark · 7 months ago

words_on_lines = [ret for line in text.splitlines() for ret in [line.split()]]

henrydark commented on Opsqueue: Lightweight batch processing queue for heavy loads – now open-source channable.com/tech/introd... · Posted by u/qqwy

henrydark · 7 months ago

Dask for python satisfies exactly these requirements, just in the python ecosystem. A pattern I have been using at multiple workplaces for the last decade was to start a dask cluster (maybe 10 years ago I would start an ipyparallel cluster) on any node that does computation, then as I needed I would spin new nodes and connect them. This gives dynamic infinite scalability, with almost no overhead or even code debt - the dask interfaces are great even without using distributed computing. When I wasn't allowed to use containers, I would sneakily add code to other machines to join my dask clusters. I would connect any and all computing devices. One company pushed us to use databricks and spark, and I never got it - why would we commit to a cluster size before we started a computation?

henrydark commented on Researchers accurately dating a 7k-year-old settlement using cosmic rays phys.org/news/2024-05-suc... · Posted by u/wglb

gaoryrt · 2 years ago

Sorry but what's the difference between this and C14[1] dating?

[1]: https://en.wikipedia.org/wiki/Radiocarbon_dating

henrydark · 2 years ago

The innovation is to find traces of a global cosmic-ray event with which to connect the dating of objects in one local area, Greece, where the dendrochronological data is not continuous, with those in far away local areas, for example England/Ireland, where we have continuous dendrochronological data

henrydark commented on PRQL as a DuckDB Extension github.com/ywelsch/duckdb... · Posted by u/tosh

klysm · 2 years ago

it sounds minor, but having `from` before `select` means you can get autocomplete

henrydark · 2 years ago

I get the sentiment, but personally I can easily imagine myself writing an autocompleter that would work fine with select before from. (I don't write much sql so I don't)

Just to clarify, my point is that when we do write sql most of us start by writing the from part, and even if we didn't I can just offer all columns from all tables I know about with some heuristic for their order when autocompleting in the select part.

u/henrydark

KarmaCake day423October 8, 2021View Original