But credit where it is due, obviously clickhouse is an industry leader.
https://youtu.be/mNneCaZewTg?si=N68fsBlYS3tuvLe3 begins at 34:32
…rebranded with a different name, again.
Again complex, again no obvious way to query storage directly, again unclear performance characteristics, again no obvious reason to see why the networking costs make saving from it largely meaningless.
You have to admit it’s a bit of a hard sell without any comeback after literally just saying that people were just inventing new names for minor variations on tiered storage…
We've designed WarpStream to work extremely well on the slower, harder-to-use one first, and that is how 95+% of our workloads run in production. The tiered storage solutions from other streaming vendors do the opposite, where they were first designed for local SSDs and then bolted on object storage later.
The equivalent would be if we were pitching our support for an even slower, cheaper tier of object storage like AWS S3 Glacier.
Another thing worth looking into is S3 Mountpoint with or without read caching, which offers a POSIX-like interface for S3 to applications that don't natively support S3.
That's still a trade-off. Object storage, simply by the overhead of HTTP + SSL, has higher latency than EFS, which has higher latency than EBS, which has higher latency than local SSD. So in the end your service (no matter if it's Kafka or anything else) has _higher_ latency if you also want consistency (aka resilience against "everything goes dark in an instant") as all writes on all machines in the pool have to be committed to storage.
The only way a "zero disk" anything makes sense is if you have enough machines in enough diverse locations with enough RAM to cover the entire workload and to pray there's never any event taking the entire cloud provider offline.
We're not talking about no disks as in no storage, just nothing other than object storage. This does have a latency trade-off, but with the advent of S3 Express One Zone and Azure's equivalent high-performance tier (with GCP surely not far behind), a system designed purely around object storage can now trade cost for latency where it makes sense. WarpStream already has support for writing to a quorum of S3 Express One Zone buckets to provide regional availability, so there's not an availability trade-off here either.
As a consequence of compaction, when deleting the build up of many small add/delete files, in a format like Iceberg, you would lose the ability to time travel to those earlier states.
With DuckLake's ability to refer to parts of parquet files, we can preserve the ability to time travel, even after deleting the old parquet files