thomoco (u/thomoco) - Readit News

thomoco commented on My Fediverse use – I'm hosting everything myself – PeerTube, Mastodon and Lemmy tube.jeena.net/w/nivehRx8... · Posted by u/jeena

thomoco · 2 years ago

This is inspiring. We're seeing some momentum now with the internet pendulum swinging back in the direction of the distributed and federated platform that it was originally intended to be. Interoperability via standards-based protocols provides choice of providers and methods, and helps to limit the influence of walled gardens and ad-based services. I expect to see a number of services and providers arising that will offer good data privacy controls and an ad-free experience with funding via a reasonable monthly fee. Well done!

thomoco commented on Show HN: Building musical synthesizers with SQL queries github.com/ClickHouse/Noi... · Posted by u/zX41ZdbW

IIAOPSW · 2 years ago

Is it vulnerable to SQL injection?

thomoco · 2 years ago

frequency modulation attack

thomoco commented on Send emails directly on top of Snowflake (YC W22) castled.io... · Posted by u/aruntdharan

aruntdharan · 3 years ago

Hi HN, We're Arun, Frank and Abhilash from Castled Data(https://castled.io). Castled is a marketing automation platform built directly on top of modern data warehouses like Snowflake, BigQuery, Redshift, and Postgres. Here is a quick demo:

https://www.loom.com/share/671cc9fb11c648cfb00ea5d4fe1d8ec4

thomoco · 3 years ago

hi Arun - this is Thom from ClickHouse

I was just wondering if integrating with ClickHouse is on the roadmap? ClickHouse was built and literally designed for massive web event and click streams. We'd be happy to help, let us know and best wishes ahead

thomoco commented on SQL should be the default choice for data transformation logic robinlinacre.com/recommen... · Posted by u/RobinL

thomoco · 3 years ago

Agree with the OP that SQL will almost assuredly still be in use for 20+ years in the future, given the simplicity and flexibility of the declarative language, standardization, and as applicable to today as it was then to our big data problems.

Any discussion of SQL at scale must include ClickHouse [https://clickhouse.com/docs/en/install#self-managed-install], given it's broad open-source use, integrations available for Spark with JDBC [https://github.com/ClickHouse/clickhouse-jdbc/] or the open-source Spark-ClickHouse Connector [https://github.com/housepower/spark-clickhouse-connector], and capability to scale SQL as a network service.

Disclosure: I work for ClickHouse

thomoco commented on Apache Hudi vs. Delta Lake vs. Apache Iceberg Lakehouse Feature Comparison onehouse.ai/blog/apache-h... · Posted by u/bhasudha

vgt · 3 years ago

It would be good if you labeled your posts so as to reveal your bias.

I understand why folks want options. At the end of the day, folks want an easy to use, ALWAYS CORRECT stable database, with minimal well-documented predictable knobs, correct distributed execution plan, no OOMs, separation of storage and compute, and standard SQL, and Clickhouse struggles with all of the above.

(co-founder of MotherDuck)

thomoco · 3 years ago

Could you please elaborate on your comments and possible misconceptions about ClickHouse? Proven stability, massive scale, predictability, native SQL, and industry-best performance are all well-recognized characteristics of clickhouse, so your comments here seem a bit biased.

I am interested to learn more about your point of view, as well as tangentially the strategic vision of MotherDuck as a company.

(VP Support at ClickHouse)

thomoco commented on ClickHouse Cloud is now in Public Beta clickhouse.com/blog/click... · Posted by u/taubek

base · 3 years ago

Is there an easy way to have ClickHouse Cloud ingest data from MySQL hosted in Amazon RDS?

thomoco · 3 years ago

There are a few options for migrating or synchronizing data from MySQL - I'd recommend starting with this page in the ClickHouse Docs - there is a nice video there that explains some of those options:

https://clickhouse.com/docs/en/integrations/migration/

Depending on what you are trying to achieve, you could use clickhouse-local with the MySQL engine to move data, or could use an ETL/ETL tool to migrate/sync

thomoco commented on ClickHouse Cloud is now in Public Beta clickhouse.com/blog/click... · Posted by u/taubek

avereveard · 3 years ago

Is lower time the right metric here? Seems normalizing per price would make a more useful metric for big data as long as the response time is reasonable

thomoco · 3 years ago

Yes, ClickBench results are presented as Relative Time, where lower is better. You can read more on the specifics of ClickBench methodology in the GitHub repository here: https://github.com/ClickHouse/ClickBench/

There are other responses from ClickHouse in the comments on the pricing, so I'll defer to their expertise on that topic there. Thank you for your feedback and ideas, as normalizing a price-based benchmark is an interesting concept (and where ClickHouse would expect to lead also given the architecture and efficiency)

thomoco commented on ClickHouse Cloud is now in Public Beta clickhouse.com/blog/click... · Posted by u/taubek

thomoco · 3 years ago

I wanted to note that ClickHouse Cloud results are now also being reported in the public ClickBench results: https://benchmark.clickhouse.com/

Good to see transparent comparisons available now for Cloud performance vs. self-hosted or bare metal results as well as results from our peers. The ClickHouse team will continue to optimize further - as scale and performance is a relentless pursuit here at ClickHouse, and something we expect to be performed transparently and in a reproducible manner. Public benchmarking benefits all of us in the tech industry as we learn from each other in sharing the best techniques for attaining high performance within a cloud architecture

Full disclosure: I do work for ClickHouse, although have also been a past member of SPEC in developing and advocating for public, standardized benchmarks

thomoco commented on Show HN: A benchmark for analytical databases (Snowflake, Druid, Redshift) benchmark.clickhouse.com/... · Posted by u/zX41ZdbW

gianm · 3 years ago

This is impressive work: it's time consuming to set up and benchmark so many different systems!

Impressiveness of the effort notwithstanding, I also want to encourage people to do their own research. As a database author myself (I work on Apache Druid) I have really mixed feelings about publishing benchmarks. They're fun, especially when you win. But I always want to caution people not to put too much stock in them. We published one a few months ago showing Druid being faster than Clickhouse (https://imply.io/blog/druid-nails-cost-efficiency-challenge-...) on a different workload, but we couldn't resist writing it in a tongue-in-cheek way that poked fun at the whole concept of published benchmarks. It just seems wrong to take them too seriously. I hope most readers took the closing message to heart: benchmarks are just one data point among many.

That's why I appreciate the comment "All Benchmarks Are Liars" on the "limitations" section of this benchmark -- something we can agree on :)

thomoco · 3 years ago

Benchmarks can be quite difficult to interpret when the test parameters vary between tests. However, I think the point in providing the open-source ClickBench benchmark [https://github.com/ClickHouse/ClickBench] is exactly to allow users to do their own research in providing a standardized client and workload across any SQL-based DBMS. Standardized benchmarking is an important technique, for comparing across different applications, or also for comparing the same application across different environments (compute, storage, cloud vs. self-managed, etc.). SPEC [https://www.spec.org] used to do a great job in developing and releasing standardized benchmarks, although their activity has waned of late