monstrado (u/monstrado)

monstrado commented on Claude Code: Now in Beta in Zed zed.dev/blog/claude-code-... · Posted by u/meetpateltech

ZpJuUuNaQ5 · 3 days ago

I am sure Zed is great and I appreciate the effort put in to create it, but nowadays I just cannot imagine switching from VSCode to something else. In my limited understanding, none of the existing alternatives offer anything (and often misses at least something) truly innovative or anything else that VSCode extension wouldn't solve. On VSCode I have about 15 different profiles setup, each with different settings and dozens of extensions based on either a technology stack or a project - it would be really difficult to find a good reason to throw it all away. The idea of switching between IDEs does not appeal to me either. I do use Neovim a little bit too, but most of that usage time was spent on configuration.

monstrado · 3 days ago

I think the point of ACP being an open protocol is so that other editors (e.g. VSCode, Neovim) can implement it as a receiver and integration with ClaudeCode/GeminiCLI/... would just work.

monstrado commented on Apple built iCloud to store billions of databases read.engineerscodex.com/p... · Posted by u/gempir

sidcool · 2 years ago

Sounds cool. Any write up on this? How did you approach the design? What was the motivation to use foundation db? How much did you/your team needed to learn while doing it?

monstrado · 2 years ago

No write up, but the main reason was reusing the existing database we were comfortable deploying at the time. We were already using FDB for an online aggregation / mutation store for ad-hoc time-series analytics...albeit, a custom layer that we wrote (not RecordLayer).

When RecordLayer launched, I tested it out by building a catalog system that we could evolve and add new services with a single repository of protobuf schemas.

monstrado commented on Apple built iCloud to store billions of databases read.engineerscodex.com/p... · Posted by u/gempir

monstrado · 2 years ago

I leveraged FoundationDB and RecordLayer to build a transactional catalog system for all our data services at a previous company, and it was honestly just an amazing piece of software. Adding gRPC into the mix for the serving layer felt so natural since schemas / records are defined using Protobuf with RecordLayer.

The only real downside is that the onramp for running FoundationDB at scale is quite a bit higher than a traditional distributed database.

monstrado commented on ChDB: Embedded OLAP SQL Engine Powered by ClickHouse github.com/chdb-io/chdb... · Posted by u/nalgeon

adulion · 2 years ago

why would you use this over duckdb?

monstrado · 2 years ago

One reason would be if you're already fluent in ClickHouse's SQL dialect. Although they maintain great standard SQL compatibility, they also have a great deal of special functions/aggregates/etc that are ClickHouse specific.

Other reasons include their wide range of input formats, special table functions (e.g. query a URL).

monstrado commented on FoundationDB: A Distributed Key-Value Store cacm.acm.org/magazines/20... · Posted by u/eatonphil

monstrado · 2 years ago

I built an online / mutable time-series database using FDB a few years back at a previous company. Not only was it rock solid, but it scaled linearly pretty effortlessly. It truly is one of novel modern pieces of technologies out there, and I wish there were more layers built on top of it.

monstrado commented on Apache Hudi vs. Delta Lake vs. Apache Iceberg Lakehouse Feature Comparison onehouse.ai/blog/apache-h... · Posted by u/bhasudha

vgt · 3 years ago

Speaking from nearly a decade working on BigQuery, and a year working at Firebolt.

- Stability. It OOMS, your CTO mentioned that last week.

- It is not correct. I believe your team is aware of cases in which your very own benchmarks revealed Clickhouse to be incorrect.

- Scale. The distributed plan is broken and I'm not sure Clickhouse even has shuffle.

- SQL. It is very non-standard.

- Knobs. Lots of knobs that are poorly documented. It's unclear which are mandatory. You have to restart for most.

Don't get me wrong, I love open source, and I love what Clickhouse has done. I am not a fan of overselling. There are problems with Clickhouse. Trying to sell it as a superset of the modern CDW is not doing users any favors.

monstrado · 3 years ago

As an engineer who admires the work done by DuckDB, I'm disappointed that the co-founder of its evolution is spreading FUD about competitors before its even in the competitive conversation.

> Stability. It OOMS, your CTO mentioned that last week.

I ran ClickHouse clusters for years with zero stability issues (even as a beginner at the time) at an extremely large volume video game studio with real-time needs. Using online materialized views, I was able to construct rollups of vital KPIs at millisecond level while maintaining multi-thousand QPS. Stability was never a concern of ours, and quite frankly, we were kind of blown away.

> Scale. The distributed plan is broken and I'm not sure Clickhouse even has shuffle.

First, I hate the word "broken" with zero explanation what you mean by this. Based on your language, I'm assuming you're just suggesting the distributed plans aren't as efficient as possible, a limitation that the engineers are not shy to admit.

> SQL. It is very non-standard.

I would argue the language is more a superset than "non-standard". Most everything for us just worked, and often I found areas of SQL that I could reduce significantly due to the "non-standard" extras they've added. For example: Did you know they have built-in aggregate functions for computing retention?!

> Knobs. Lots of knobs that are poorly documented. It's unclear which are mandatory. You have to restart for most.

Yes, there are a lot of knobs. ClickHouse works wonderfully out of the box with the default knobs, but you're free to tinker because that's how flexible the technology is.

You worked at Google for over a decade? You should know. Google's tech is notorious for having a TON of knobs for their internal technology (e.g. BigTable). Just because the knobs are there doesn't mean they must be tuned, it just means the engineers thought ahead. Also, the vast majority of configuration changes I've made never required a restart...I'm not even sure why you pointed this out.

(Disclaimer: I have been using ClickHouse successfully for several years)

monstrado commented on ClickHouse Cloud is now in Public Beta clickhouse.com/blog/click... · Posted by u/taubek

kawsper · 3 years ago

We're currently using InfluxDB for some timeseries metrics, but the 2.0 migration path have been terrible, even for simple examples, so we're looking for something else.

Have anyone migrated from InfluxDB to ClickHouse?

monstrado · 3 years ago

Disclaimer: I work at ClickHouse

At a previous company, I wrote a simple TCP server to receive LineProtocol, parse it and write to ClickHouse. I was absolutely blown away by how fast I could chart data in Grafana [1]. The compression was stellar, as well...I was able to store and chart years of history data. We basically just stopped sending data to Influx and migrated everything over to the ClickHouse backend.

[1] https://grafana.com/grafana/plugins/grafana-clickhouse-datas...

monstrado commented on Turning SQLite into a Distributed Database univalence.me/posts/mvsql... · Posted by u/losfair

foobiekr · 3 years ago

Where I previously was runs it in production. it's not hard to scale but at some point you will need to have multiple clusters (maxes out in practice at like 50 instances).

It's basically trouble free unless you run below 10% free space on any instance, where things go bad.

monstrado · 3 years ago

Not sure if I hit those limits, we were at around 100 nodes and over 170-180 processes. The biggest thing we recognized was tuning the number of recruited proxies and other stateless roles. We were doing up to around 400k tps once we tuned those.

monstrado commented on Turning SQLite into a Distributed Database univalence.me/posts/mvsql... · Posted by u/losfair

victor106 · 3 years ago

How do you get started with FDB? I found it very powerful but couldn’t find good set of instructions on how to setup and scale.

monstrado · 3 years ago

Running it locally is as easy as downloading and installing. Scaling FDB is a bit more of a challenge partially due to their process-per-core design decision, which coincidently helps make FDB as bullet proof as it is.

monstrado commented on Turning SQLite into a Distributed Database univalence.me/posts/mvsql... · Posted by u/losfair

zasdffaa · 3 years ago

> Their vision was to build the hardest parts of building a database, such as transactions, fault-tolerance, high-availability, elastic scaling, etc. This would free users to build higher-level (Layers) APIs [1] / libraries [2] on top.

That is very interesting and simple and valuable insight that seems to be missing from the wiki page. But also from the wiki page <https://en.wikipedia.org/wiki/FoundationDB>, this:

--

The design of FoundationDB results in several limitations:

Long transactions- FoundationDB does not support transactions running over five seconds.

Large transactions - Transaction size cannot exceed 10 MB of total written keys and values.

Large keys and values - Keys cannot exceed 10 kB in size. Values cannot exceed 100 kB in size.

--

Those (unless worked around) would be absolute blockers to several systems I've worked on.

monstrado · 3 years ago

This project (mvSQLite) appears to have found a way around the 5s transaction limit as well as the size, so that's really promising. That being said, I believe the new RedWood storage engine in FDB 7.0+ is making inroads in eliminating some of these limitations, and this project should also benefit from that new storage engine...(prefix compression is a big one).