“Ducklake DuckDB extension” really rolls off the tongue /s.
“Ducklake DuckDB extension” really rolls off the tongue /s.
They should be using the best technical and cheapest solution, and they owe it to their investors. At their scale they will never be able to use anything else than a cloud solution.
They could solve these issues at the number of users they report, for a monthly bill below 25 million dollars.
"6,311 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed more than 376 billion transactions, stored 2,978 terabytes of data, and transferred 913 terabytes of data" - https://aws.amazon.com/blogs/aws/how-aws-powered-prime-day-2...
Why are they not sharing by user/org yet? It is so simple and would fix the primary issue they are running into.
All these work arounds they go through to avoid a straight forward fix.
IME, I've found sampling simpler to reason about, and with the sampling rate part of the message, deriving metrics from logs works pretty well.
The example in the article is a little contrived. Healthchecks often originate from multiple hosts and/or logs contain the remote address+port, leading to each log message being effectively unique. So sure, one could parse the remote address into remote_address=192.168.12.23 remote_port=64780 and then decide to drop the port in the aggregation, but is it worth the squeeze?
An approachable paper on the topic is "Effective Computation of Biased Quantiles over Data Streams" http://dimacs.rutgers.edu/%7Egraham/pubs/papers/bquant-icde....
Why not dump all metrics , events and logs into Clickhouse ? and purge data as necessary? For small to medium sized businesses/solution ecosystem, will this be be enough ?
If ICANN-approved root.zone and ICANN-approved registries are the only options.
As an experiment I created own registry, not shared with anyone. For many years I have run own root server, i.e., I serve own custom root.zone to all computers I own. I have a search experiment that uses a custom TLD that embeds a well-known classification system. The TLD portion of the domainname can catgorise any product or service on Earth.
ICANN TLDs are vague, ambiguous, sometimes even deceptive.