> They are initially a new physical part that is later merged into the main table structure
> A single DELETE is an ALTER TABLE operation
Can you explain these two further?
The reason I mentioned it is because it's a huge surprise to some people that... from the docs: "The ALTER TABLE prefix makes this syntax different from most other systems supporting SQL. It is intended to signify that unlike similar queries in OLTP databases this is a heavy operation not designed for frequent use. ALTER TABLE is considered a heavyweight operation that requires the underlying data to be merged before it is deleted."
There's also a "lightweight delete" available in many circumstances https://clickhouse.com/docs/sql-reference/statements/delete. Something really nice about the ClickHouse docs is that they devote quite a bit of text to describing the design and performance implications of using an operation. It reiterates the focus on performance that is pervasive across the product.
Edit: Per the other part of your question, why inserts create new parts and how they are merged is best described here https://clickhouse.com/docs/engines/table-engines/mergetree-...
The heart of Clickhouse are these table engines (they don't exist in Postgres) https://clickhouse.com/docs/engines/table-engines . The primary column (or columns) is ordered in some way and adjacent values in memory are from the same column in the table. Index entries span wide areas (EG: By default there's only one key record in the primary index for every 8192 rows) because most operations in Clickhouse are aggregate in nature. Inserts are also expected to be in bulk (They are initially a new physical part that is later merged into the main table structure). A single DELETE is an ALTER TABLE operation in the MergeTree engine. :)
This structure allows it to literally crunch billions of values per second (brutally, not with pre-processing, erm, "tricks" although there is a lot of support for that in Clickhouse as well). I've had tables with hundreds of columns and 100+ billion rows that are nearly as performant as a million row table if I can structure the query to work with the table's physical ordering.
Clickhouse recommends not using nullable fields because of the performance implications (it requires storing a bit somewhere for each value). That's how much they care about perf and how close to the raw data type it is that their memory allocation uses. :)
with 200$/month I have a good database. $1-5M revenue?
If you are at all interested in the current challenges being grappled on in this space, this does a great job of illuminating some of them. Many many interesting passages in here and the text transcript has links to relevant papers when their topics are brought up. Really like that aspect and would love to see that done a lot more often.
There are just so many ways to solve a problem now that it's more or less impossible for someone to be familiar with all of them.
That said, some of the, erm, "new ways" to solve problems have been significant advancements. EG: Async/Await was a huge improvement over Combine for a wide variety of scenarios.
However, luckily in my case, it was caught immediately in the staging env since collisions caused exceptions.
Realizing when an expression is evaluated is pretty easy to miss. That code is probably live somewhere else right now surreptitiously causing issues.
I'm surprised that VSCode extensions don't have a permissions system (EG: "Request network access").
Sure, it's not going to trend on Apple Music... but it's the best we've ever done and a genuine step above previous efforts.
Edit: This no longer repros and only the correct solution works now from what I can tell.