I was just wondering if integrating with ClickHouse is on the roadmap? ClickHouse was built and literally designed for massive web event and click streams. We'd be happy to help, let us know and best wishes ahead
Any discussion of SQL at scale must include ClickHouse [https://clickhouse.com/docs/en/install#self-managed-install], given it's broad open-source use, integrations available for Spark with JDBC [https://github.com/ClickHouse/clickhouse-jdbc/] or the open-source Spark-ClickHouse Connector [https://github.com/housepower/spark-clickhouse-connector], and capability to scale SQL as a network service.
Disclosure: I work for ClickHouse
I understand why folks want options. At the end of the day, folks want an easy to use, ALWAYS CORRECT stable database, with minimal well-documented predictable knobs, correct distributed execution plan, no OOMs, separation of storage and compute, and standard SQL, and Clickhouse struggles with all of the above.
(co-founder of MotherDuck)
I am interested to learn more about your point of view, as well as tangentially the strategic vision of MotherDuck as a company.
(VP Support at ClickHouse)
https://clickhouse.com/docs/en/integrations/migration/
Depending on what you are trying to achieve, you could use clickhouse-local with the MySQL engine to move data, or could use an ETL/ETL tool to migrate/sync
There are other responses from ClickHouse in the comments on the pricing, so I'll defer to their expertise on that topic there. Thank you for your feedback and ideas, as normalizing a price-based benchmark is an interesting concept (and where ClickHouse would expect to lead also given the architecture and efficiency)
Good to see transparent comparisons available now for Cloud performance vs. self-hosted or bare metal results as well as results from our peers. The ClickHouse team will continue to optimize further - as scale and performance is a relentless pursuit here at ClickHouse, and something we expect to be performed transparently and in a reproducible manner. Public benchmarking benefits all of us in the tech industry as we learn from each other in sharing the best techniques for attaining high performance within a cloud architecture
Full disclosure: I do work for ClickHouse, although have also been a past member of SPEC in developing and advocating for public, standardized benchmarks
Impressiveness of the effort notwithstanding, I also want to encourage people to do their own research. As a database author myself (I work on Apache Druid) I have really mixed feelings about publishing benchmarks. They're fun, especially when you win. But I always want to caution people not to put too much stock in them. We published one a few months ago showing Druid being faster than Clickhouse (https://imply.io/blog/druid-nails-cost-efficiency-challenge-...) on a different workload, but we couldn't resist writing it in a tongue-in-cheek way that poked fun at the whole concept of published benchmarks. It just seems wrong to take them too seriously. I hope most readers took the closing message to heart: benchmarks are just one data point among many.
That's why I appreciate the comment "All Benchmarks Are Liars" on the "limitations" section of this benchmark -- something we can agree on :)