Readit News logoReadit News
Smrchy · 4 years ago
I'd like to thank the creators of ClickHouse as i hope they are reading here. We've been using it since 2019 in a single server setup with billions of rows. No problems at all. And query speeds that seem unreal compared to MySQL and pg.

As we did not want to go into the HA/backup/restore details at that time we created a solution that can be quickly recreated from data in other databases.

Interesting presentation from Alexey about Features and Roadmap from May 2021:

https://www.youtube.com/watch?v=t7mA1aOx3tM

parth_patil · 4 years ago
I have similar first hand experience with ClickHouse. In the past I have moved custom analytics solution I had built on HBase to a solution running on a single node ClickHouse and had no issues whatsoever. In my current startup I am again using ClickHouse with great success. It's a mind boggling fast. Thanks ClickHouse team for building such an amazing system and for making it open source.
seektable · 4 years ago
That's exactly a use-case I meant below. Do you use any BI tool to visualize CH queries?
pachico · 4 years ago
I use Grafana for that. At the moment, we have developed entire internal products based on ClickHouse + Grafana.
mlazowik · 4 years ago
There’s a community connector for metabase https://github.com/enqueue/metabase-clickhouse-driver
Smrchy · 4 years ago
No, the results are embedded in a web app.
whitepoplar · 4 years ago
Haven't used any of these yet, but how does ClickHouse compare to Postgres extensions like TimescaleDB and Citus (which recently launched a columnar feature)? I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?
fiddlerwoaroof · 4 years ago
I benchmarked ClickHouse vs. Timescale, Citus, Greenplum and Elasticsearch for a real-time analytics application. With a couple hours learning for each (although I’ve used Postgres extensively and so Postgres-backed databases had a bit of an advantage), ClickHouse’s performance was easily an order of magnitude or two better than anything except ES. ES had its own downsides with respect to the queries we could run (which is why we were leaving ES in the first place).
eastdakota · 4 years ago
Cloudflare’s analytics have been powered by Clickhouse for a long time. And I was an early investor in Timescale. They’re both excellent products.
someguy101010 · 4 years ago
What did you end up going with?
germandiago · 4 years ago
I know of a database that claims to perform even faster than that one, It is commercial, though. For very massive data, in a time-series setup basically.

https://www.hydrolix.io/

stingraycharles · 4 years ago
In a nutshell, my extremely subjective and biased take on it:

* Citus has a great clustering story, and a small data warehousing story, afaik no timeseries story;

* TimescaleDB has a great timeseries story, and an average data warehousing story;

* Clickhouse has a great data warehousing story, an average timeseries story, and a bit meh clustering story (YMMV).

(Disclaimer: I work for a competitor)

akulkarni · 4 years ago
[Timescale co-founder]

This is a really great comparison. I might borrow it in the future :-)

But yes, if you have classic OLAP-style queries (e.g., queries that need to touch every database row), Clickhouse is likely the better option.

For anything time-series related, and/or if you like/love Postgres, that is where TimescaleDB shines. (But please make sure you turn on compression!)

TimescaleDB also has a good clustering story, which is also improving over time. [0][1]

[0] https://news.ycombinator.com/item?id=23272992

[1] https://news.ycombinator.com/item?id=24931994

zX41ZdbW · 4 years ago
> Disclaimer: I work for a competitor

What competitor btw? I tried to open a link from your profile but it does not work.

ericb · 4 years ago
ClickHouse wins on licensing--Apache.

The TimeScale licensing approach, the way it is written, perhaps accidentally, has lots of hidden landmines. The TimeScale license slants toward cloud giant defense to the extent that normal use is perilous.

For example, timescale can be used for normal data (postgres) as well, so any rules seem to apply to all your data in the database. The free license only usable available if:

the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects, such as through use of the Timescale Data Definition Interfaces, in a Timescale Database utilized by such Value Added Products or Services.

My read is that if you let a customer do anything that adds a custom field, or table, or database, or trigger, or anything that is "structural" (even in the regular relational stuff) anywhere in your database (metrics or not), you are in violation. There doesn't seem to be a distinction about whether this is "direct" control or not, or whether a setting indirectly adds a trigger. I don't want to be in a courtroom debating whether a new metric is a "structural change!"

Now, none of that might be the intent of the license, but you have to go by what it says, not intentions.

The sad part of that is, I, and I'm sure many folks, have no interest in starting a database company, but we can't rally timescale because of legal risk. Looks awesome otherwise, though.

akulkarni · 4 years ago
[Timescale co-founder here]

Hi Eric, thanks for taking a close look at our license.

I'd like to dispel some misconceptions:

The core of TimescaleDB is Apache2. Advanced features are under the Timescale License.

Regarding this:

  the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects, such as through use of the Timescale Data Definition Interfaces, in a Timescale Database utilized by such Value Added Products or Services.

  My read is that if you let a customer do anything that adds a custom field, or table, or database, or trigger, or anything that is "structural" (even in the regular relational stuff) anywhere in your database (metrics or not), you are in violation. There doesn't seem to be a distinction about whether this is "direct" control or not, or whether a setting indirectly adds a trigger. I don't want to be in a courtroom debating whether a new metric is a "structural change!"
That's not correct, and we took pains to clarify that in the license:

  3.5 "Timescale Data Definition Interfaces" means SQL commands and other interfaces of the Timescale Software that can be used to define or modify the database schema and other structural aspects of database objects in a Timescale Database, including Data Definition Language (DDL) commands such as CREATE, DROP, ALTER, TRUNCATE, COMMENT, and RENAME. [0]

Strictly speaking, if you provide Data Definition Interfaces (DDL) to customers via a SaaS service (ie you are running a TimescaleDBaaS - which applies to < 0.000001% of all possible users) you are in violation of the license. But otherwise you are fine.

If you are looking for more votes of confidence, today there are literally millions of active TimescaleDB instances, including by large companies like Walmart, Comcast, IBM, Cisco, Electronic Arts, Bosch, Samsung, and many many smaller ones. [2]

If you have any other questions, I'm happy to answer them here, or offline (ajay at timescale dot com).

[1] https://www.timescale.com/legal/licenses#section-3-5-timesca...

[2] https://www.timescale.com/

goodpoint · 4 years ago
> ClickHouse wins on licensing--Apache

How so? An end user should prefer a database under a license that protect the developer and users from cloudification/proprietization/SaaS

didip · 4 years ago
ClickHouse competes with OLAP storages like Druid or Pinot.

I don't know about ClickHouse but the other 2 uses bitmap indexes to make storing petabytes of data affordable.

Row oriented databases would struggle to compete against ClickHouse. They are easily an order of magnitude slower.

hodgesrm · 4 years ago
ClickHouse uses skip indexes. They basically answer the question "is the value I'm seeking not in the block."

For example, there are a couple varieties of Bloom filters, which allow you to test for presence of string sequences in blocks. This allows ClickHouse to skip reading and uncompressing blocks (actually called granules) unnecessarily.

ddbennett · 4 years ago
Sentry.io settled on Clickhouse for error and transaction data after reviewing several options including Citus and Elastic. We've been happy with both the performance and how well it scales from Open Source installs to our SaaS clusters.
zX41ZdbW · 4 years ago
There are many independent comparisons of ClickHouse vs TimescaleDB:

By Splitbee: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... By GitLab: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... And others: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec...https://github.com/ClickHouse/ClickHouse/issues/22398#issuec...

If you'll find more, please post it there.

TimescaleDB can work pretty fine in time series scenario but does not shine on analytical queries. For most of time series queries, it is below ClickHouse in terms of performance but for small (point) queries it can be better.

The main advantage of TimescaleDB is that it better integrates with Postgres (for obvious reasons).

There are also many comparisons of ClickHouse vs Citus. The most notable is here: https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

ClickHouse can do batch DELETE operations for data cleanup. https://clickhouse.com/docs/en/sql-reference/statements/alte... It is not for frequent single-record deletions, but it can fulfill the needs for data cleanup, retention, GDPR requirements.

Also you can tune TTL rules in ClickHouse, per table or per columns (say, replace all IP addresses to zero after three months).

ryanbooz · 4 years ago
[Timescale DevRel here]

@zX41ZdbW@ - Thanks for pointing out the various benchmarks that have been run by other companies between Clickhouse and TimescaleDB using TSBS[1]. As we mentioned, we'll dig deeper into a similar benchmark with much more detail than any of those examples in an upcoming blog post.

One notable omission on all of the benchmarks that we've seen is that none of them enable TimescaleDB compression (which also transforms row-oriented data into a columnar-type format). In our detailed benchmarking, queries on compressed columnar data in Timescale outperformed Clickhouse in most queries, particularly as cardinality increases, often by 5x or more. And with compression of 90% or more, storage is often comparable. (Again, blog post coming soon - we are just making sure our results are accurate before rushing to publish.)

The beauty of TimescaleDB columnar compression model is that it allows the user to decide when their workload can benefit from deep/narrow queries of data that doesn't change often (although it can still be modified just like regular row data), verses shallow/wide queries for things like inserting data and near-time queries.

It's a hybrid model that provides a lot of flexibility for users AND significantly improves the performance of historical queries. So yes, we do agree that columnar storage is a huge performance win for many types of queries.

And of course, with TimescaleDB, one also gets all of the benefits of PostgreSQL and its vibrant ecosystem.

Can't wait to share the details in the coming weeks!

[1]: https://github.com/timescale/tsbs

dagi3d · 4 years ago
ClickHouse can delete rows but work as batch/async operations: https://clickhouse.com/docs/en/faq/operations/delete-old-dat...
zepearl · 4 years ago
Correct + wanted to mention that "lightweight/point-deletes" might come as a new feature.

Initial discussion: https://github.com/ClickHouse/ClickHouse/issues/19627

Being implemented: https://github.com/ClickHouse/ClickHouse/pull/24755

lima · 4 years ago
> I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?

Clickhouse has ALTER ... DELETE and ALTER ... UPDATE functionality now! (and TTLs)

ryanbooz · 4 years ago
(Timescale DevRel here)

We've recently been working through a detailed benchmark of TimescaleDB and Clickhouse. The DELETE/UPDATE question has been an intriguing story to follow - and I honestly hadn't considered the GDPR angle.

ATM, Clickhouse is still OLAP focused and their MergeTree implementation does not allow direct DELETE (or UPDATE) of any data. All DELETE/UPDATE requests are applied asynchronously by (essentially) re-writing/merging the table data (it's referred to as a "mutation") without whatever data was referenced in the DELETE/UPDATE. [1]

[1]: https://clickhouse.com/docs/en/sql-reference/statements/alte...

hkolk · 4 years ago
We are using Clickhouse combined with GDPR's Data Deletion Requests. We store the user-ids in a separate system, and run the ALTER/DELETE statements once per week. Works pretty smooth, though I would prefer some more automation within Clickhouse for them.

Data for in-active users gets deleted because our clickhouse retention policy is lower than the in-active-user timeout

DevKoala · 4 years ago
ClickHouse does allow delete and update operations. They are just asynchronous functions.

I use them every now and then, but I prefer working with partition strategies when I have to these programmatically.

nezirus · 4 years ago
You are correct, the proper way to do deletions in ClickHouse is to use partitions, and drop partitions. That is probably good enough for most analytical use cases, but YMMV.
kwillets · 4 years ago
This question is becoming critical right now, as nonrecoverable deletes are required within 30 days for both GDPR and CCPA.

Most products do the asynchronous rewrite, especially if they're based on immutable storage. That's fine, but it should be tested to verify that it's not triggering on every delete, for example, and that it's resource-efficient.

Deleted Comment

hodgesrm · 4 years ago
> I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?

Altinity is fixing this. The project is called Lightweight Delete and it's for exactly the GDPR reason cited. The idea is that there will be a SQL DELETE command that causes rows to disappear instantly. What actually will happen is that they will be marked as deleted, then garbage collected on the next merge.

Disclaimer: I work for Altinity.

jenny91 · 4 years ago
These some really great technology coming out of Russia in the information retrieval/database world: ClickHouse, a bunch of Postgres stuff that Yandex is working on, 2gis.ru (a super detailed vector map on a completely different stack to Google/MapBox), etc.
whitepoplar · 4 years ago
Definitely! Do you have any further info about what Postgres stuff Yandex is working on?
zxcq544 · 4 years ago
There is a company in Russia called Postgres Pro https://postgrespro.ru/ and they are those people who added json functionality to postgres. As far as I know they work on full text search for postgres now.
jenny91 · 4 years ago
There's a bunch of stuff scattered around on mailing lists and conferences: I think it's the main data source for Yandex's email offering (gmail equivalent). They've got an async C++ library for postgres called Ozo, and they're quite active in the community!
didibus · 4 years ago
> Most other database management systems don’t even permit benchmarks (through the infamous "DeWitt clause"). But we don’t fear benchmarks; we collect them.

I love the confidence here.

zbentley · 4 years ago
I don't have firsthand experience, but everyone I know with second- (colleague) and third- (acquaintance) -hand experience says the performance promises hold up.
nemo44x · 4 years ago
Of course it does - it’s purpose built for a narrow use case. However it’s an extremely popular use case.

Clickhouse optimizes on the 2 most important things for OLAP - minimal disk space due to compression benefits of columnar storage and minimal compute for the same reason - and therefore fast.

However it isn’t flexible when you want to expand the use case. You can’t do any sort of text search, no complex joins (there are no foreign keys), and you need to order you tables there way you want to sort them.

For certain things it’s perfect. It was built to solve a problem Yandax had and that’s notable. But it doesn’t have anywhere near the flexibility of Elasticsearch for example.

But yes it’s purpose built to be extremely fast and minimize storage for the types of use cases it is built for.

PeterZaitsev · 4 years ago
Interesting news indeed! I very much wonder what it means long term in terms of Licenses. I would imagine much better future if Clickhouse would become Foundation driven process which gives good protection from license change (through I'm biased here) - Currently Clickhouse fully under Apache 2.0 license may look too good to be true compared to where many successful VC funded projects took licenses of their projects (think Elastc, MongoDB, Redis)

In any case though I expect a lot of growth in Clickhouse community now and investment both engineering and most importantly Marketing - I think Clickhouse technology has a lot more adoption potential than it currently has

puppet-master · 4 years ago
I'd much rather they relicensed early if they can, to set expectations and to ensure talented people actually get to sustain its development, rather than parasitic jobsworth FAANG types who will inevitably drive development at Amazon. Free software in this context is very dead, let's not pretend network and channel effects of AWS were ever envisaged in the 1980s when most "contemporary" free software licensed were designed.
chrismorgan · 4 years ago
Canonical link: https://clickhouse.com/blog/en/2021/clickhouse-inc/

But I presume the GitHub link (https://github.com/ClickHouse/ClickHouse/blob/master/website...) has been submitted because clickhouse.com is going to be blocked for a large fraction of HN users (Peter Lowe’s Ad and tracking server list, which I think uBlock Origin has enabled by default, includes ||clickhouse.com^). I’m actually a bit curious why clickhouse.com (or more likely a subdomain?) would be being used this way; I’d have thought that they’d separate any such uses to a different domain so as not to hinder their main domain which is about the software and nothing to do with ads or tracking at all (even if that’s probably the main end use of such an OLAP DBMS).

pgl · 4 years ago
Someone just reported this to me and I've removed the entry from my blocklist.

This was a very old entry - it was added on Fri, 06 Jun 2003 19:53:00. Back then it was a marketing company that served ads.

I pride myself on knowing the entries in my list very well, but I have to admit I forgot about this one, which is ironic because I use Clickhouse at my job these days.

f0e4c2f7 · 4 years ago
I'm surprised and impressed that you would remember what's on the list.

Thank you for your hard work. Every day it makes my experience of the internet 100x better.

chrismorgan · 4 years ago
Great! That makes much more sense.

Thank you for maintaining such lists. You and a few others like you save me much time and aggravation.

sikhnerd · 4 years ago
Thank you so much for the hard work you do! It makes the web sooo much more usable.
Maxburn · 4 years ago
Thank you for this service!
wayneftw · 4 years ago
Thanks! This worked instantly just now as I purged cache and updated all lists in uBlock Origin.

I don't look forward to when Chrome enforces Manifest v3 when I'll probably have to wait for a whole extension to be updated instead of just a list file.

newman314 · 4 years ago
FWIW, clickhouse.com is also blocked by "Malvertising filter list by Disconnect"
nacs · 4 years ago
It's also blocked on my Pi-Hole install network-wide apparently.
AlfeG · 4 years ago
I'm assume that this mostly because of Ukraine. There is a blacklist for any Russian products. There is also people that try to extend this internal to UA list to various ads list.
data_ders · 4 years ago
I'm betting we'll see a "Clickhouse Cloud" product announcement in the next 12 months. I'm curious to see if they can provide enough add-on value to their open source product to be profitable. But I'm certainly rooting for them!
petr_tik · 4 years ago
worth keeping in mind that Yandex and russian technology companies in general are used to running lean and profitable operations so unfathomed in the land of 0% interest rates and VC money on tap. If they continue as they are now (15 people) and convert customers like Cloudflare into paying engagements, there is nothing stopping them from being profitable.
pm90 · 4 years ago
How exactly do they get to operate that lean? It seems the most straightforward way is by paying employees less.
ignoramous · 4 years ago
There's definitely market for a managed clickhouse 1p product. It remains to be seen if the product is substantial enough to challenge the incumbents. The engineering pedigree is ample. So that's already 50% of the way there. With money in the bank, it is all about how they suit it up with their sales and marketing. Interesting times ahead for them.
yigitkonur35 · 4 years ago
Aiven will offer a managed Clickhouse service too: https://landing.aiven.io/2020-upcoming-aiven-services-webina...

And also Altinity is a trustable partner with a great know-how about Clickhouse internals. They have started to offer managed instances in AWS: https://altinity.com/altinity-cloud-test-drive/

At last, Alibaba Cloud has an option to use: https://www.alibabacloud.com/product/clickhouse

Are there any other ones?

polskibus · 4 years ago
encoderer · 4 years ago
The only value they need to add is that you no longer need to run database clusters yourself. I would hope they don’t try to add any special paid features.
nemo44x · 4 years ago
I'm guessing that's the entire purpose here. Build Snowflake with Clickhouse branding.
hkolk · 4 years ago
I wonder how this will play out with https://altinity.com who have been doing enterprise support for quite some time..
hodgesrm · 4 years ago
I run Altinity. We think it's great. This is going to help grow adoption which benefits everyone. Watch our blog for a post in a couple hours.

BTW congrats to Alexey on the new company.

acidbaseextract · 4 years ago
As a sidenote, I saw your talk on Clickhouse to the CMU database group [1] back when and was extremely impressed with your deep technical knowledge yet down-to-earth presentation. Still haven't had an opportunity to use Clickhouse for production work, but would welcome it.

[1] https://www.youtube.com/watch?v=fGG9dApIhDU

jordanthoms · 4 years ago
We recently setup Clickhouse on GKE using the Altinity operator (and signed up for Altinity support).

There's been so many queries where I've thought 'that's going to need a join and aggregation across tens of billions of rows, no way!' - and then Clickhouse spits back a query result in 10 seconds...

tnolet · 4 years ago
We are using Altinity too. Great support up to now. We are about to go live with it. For us (see my bio for company link) having a company manage the cluster was paramount. We just want to use the data and API, not manage the machines/VM's and k8s clustering stuff.
zX41ZdbW · 4 years ago
Thank you! This is an important milestone for ClickHouse and will benefit the entire ecosystem.

Deleted Comment

mobileexpert · 4 years ago
I think similar to other situations e.g Starburst with Presto/Trino. There really are a limited number of devs pushing a long the core projects and a lot of people needing support. Each start up in the space can likely grow the pie for support and adoption and a few big enterprises will still hire in house devs.