I’m also not sure if DNA-seq data refers to the human host, or just all DNA they were able to sequence (which would include bacteria as well I guess?)
Do these operational and scaling problems include AWS's managed services? MSK, Kinesis Data Streams?
At small scale, why wouldn't someone go with one of those? And at large scale, where's the Total Cost of Ownership comparison to show that it's worth it to ditch Kafka's local disks for a model built on object storage?
RE:numbers: https://www.warpstream.com/blog/warpstream-benchmarks-and-tc...
You don't have to keep the data stored in S3 express one zone forever, you can just land it there and then immediately compact it to S3 standard. You still pay the higher fee to write to S3EOZ, but not the higher storage fee.
WarpStream does this, data gets compacted out within seconds usually. Of course this is now... tiered storage. But implemented over two "infinitely scalable" remote storage systems so it gets rid of all the operational and scaling problems you have with a typical tiered storage Kafka setup that uses local volumes as the landing zone.
RE: comparing to a single-zone Kafka cluster. A lot of people really dislike operating Kafka. Some people don't mind it and that's cool too, but its not the majority in my experience.
Sure you can click around to determine but this always annoys me. Like everyone should know what your product is and does and all you service names. Put it front and center at the top!