Postgres Just Cracked the Top Fastest Databases for Analytics

Just to be clear, standard SQL databases are not great for large-scale analytics. I know from first hand experience and a lot of pain.

We tried using Postgres with large analytics at my previous company https://threekit.com but it is an absolute pain. Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.

I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.

But then I tried using BigQuery on my https://web3dsurvey.com side project and I just recorded raw data and then wrote queries to do real-time aggregation (with a caching layer) in a large variety of ways. And it was near instant and easy and also very cheap.

So then I deployed this strategy over at Threekit.com and it also was easy and fast and cheap. Even more flexible queries than Web3DSurvey at basically no developer cost or infrastructure costs. No more need for aggregating cron-jobs or trying to decide ahead of time how users want to slice and dice the data. Real time, flexible queries on the fly via BigQuery is the way.

Also BigQuery bill for https://web3dsurvey.com is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data. Where as just running the cheapest Postgres SQL server on Google Cloud is like >$25/month and it is a slow one.

I would never go back to traditional SQL for analytics - it was hard, slow, expensive and inflexible. Worst of all worlds.

atombender · 6 months ago

Did you use plain Postgres tables or a columnar extension like Timescale, Citus, or pg_mooncake?

Nobody in their right mind would argue that Postgres without columnar storage is good for analytics. However, it looks like these extensions can be quite decent, at least at queries (you might still run into write performance due to Postgres' OLTP architecture, but I've never benchmarked it). In OLAP terms the size of your data is tiny and this would probably work just fine.

Personally I'm a huge fan of dedicated databases like ClickHouse. BigQuery gets very expensive as your data and query volume grows.

bhouston · 6 months ago

> Did you use plain Postgres tables or a columnar extension like Timescale or pg_mooncake?

Plain Postgres tables hosted on Google Cloud SQL on a decent machine.

> Nobody in their right mind would argue that Postgres without columnar storage is good for analytics.

Which is what I am saying. Standard means without extensions.

> Personally I'm a huge fan of dedicated databases like ClickHouse.

Where do you run it? What type of machine? Can it be server less?

> BigQuery gets very expensive as your data and query volume grows.

I didn't experience this, it has been hard for me to make BigQuery cost a lot. Storage costs were basically nothing and we just set the clustering/partition on DB creation to be users/sessions/events and dates respectively. It aligned really well with general query patterns.

dig1 · 6 months ago

As someone dealing with billions of records on it, BigQuery is far from cheap; G will not charge you much for storage as they will charge you for queries and data transfer.

AFAIK, the cheapest Postgres server on GCP is very expensive compared to the usual Postgres installation (price/performance).

sgarland · 6 months ago

Yep. I used it once for a personal project involving GHTorrent [0], and each query was about $20. That may be peanuts for a business, but for someone playing around on their own, it's incredibly prohibitive.

[0]: https://github.com/ghtorrent/ghtorrent.org/blob/master/gclou...

renegade-otter · 6 months ago

That and RDS are kind of a rip-off if you are trying to have a sandbox out there, short of running your own instance.

For that, I found Digital Ocean to be very reasonable.

michaelmior · 6 months ago

What do you mean by "the usual Postgres installation"?

rapfaria · 6 months ago

> 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.

This is a solved problem, and it seems the technical folks over there lacked the skills to make it work. Having indexes is just the tip of the iceberg. Composite indexes, partitioning, sharding, caching, etc, can lower reads to a few seconds on disk.

bhouston · 6 months ago

> This is a solved problem, and it seems the technical folks over there lacked the skills to make it work. Having indexes is just the tip of the iceberg. Composite indexes, partitioning, sharding, caching, etc, can lower reads to a few seconds on disk.

Or just use BigQuery and it is works, it is cheaper to run (by 10x to 100x) and can be done by a junior dev rather than a PhD in Database configuration.

I prefer simple solutions though - I also hate Kubernetes: https://benhouston3d.com/blog/why-i-left-kubernetes-for-goog...

Isn0gud · 6 months ago

Doesn't sound like solved problem to me if you have to employ more than four different mitigation strategies.

TuringNYC · 6 months ago

>> Just to be clear, standard SQL databases are not great for large-scale analytics.

What do you mean by "standard" SQL? Were you using an OLTP or OLAP db when you faced these difficulties? Also, what makes BigQuery not a "standard SQL database" -- they use GoogleSQL, true, but many DBs have slight SQL variants.

bhouston · 6 months ago

> What do you mean by "standard" SQL?

No extensions, just the default Postgres Cloud SQL offering from Google Cloud.

> Also, what makes BigQuery not a "standard SQL database" -- they use GoogleSQL, true, but many DBs have slight SQL variants.

I view BigQuery as not a standard DB because it really doesn't do transactions, rollbacks, foreign key constraints, indices, stored procedures, etc. I view it as a non-SQL database pretending to be SQL for query convenience.

braiamp · 6 months ago

According to their other comment standard means what it comes in the box by default.

zhousun · 6 months ago

Thanks for the comment but you are mixing some terminologies.

The core idea of mooncake is to built upon open columnar format + substitutable vectorized engine, while natively integrate with Postgres.

So it is indeed closer to BigQuery (especially the newer bigquery with iceberg tables) than a 'standard SQL database'. It has all the nice properties of BigQuery (ObjectStore-native, ColumnStore, Vectorized execution...) and scaling is also not impossible.

Simon_O_Rourke · 6 months ago

> it was basically untouchable because it was so slow.

I've worked with a 5B row table on Snowflake with (maybe) no indexes, and while somewhat slow you could still run reasonable queries on it in a minute or two.

bhouston · 6 months ago

I guess I liked the convenience with Bigquery of being able to do significant queries in a second or two on large data sets. It means I can do the query live for the user as they navigate the analytics dashboard and this simplifies design and implementation complexity significantly.

Galanwe · 6 months ago

> a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time

I mean, I don't know what you call "close to real time", and what kind of query you did, but I have Postgres serving requests from a 20B rows table just fine, with some light tweaking of indexes and partitions (I'm by no means a DBA).

Unless we are digging much deeper than that in the specifics of the data and queries, I don't think any conclusion can be taken from either experience.

jandrewrogers · 6 months ago

The OP is trying to run a mixed analytic workload which Postgres does poorly even at relatively small scales. No amount of "light tweaking" can fix it, it is intrinsic to the architecture. You can put 100B records in a Postgres table (I have) and it works if your workload is trivial but not well. The Postgres internals are not designed for data this large, it is an OLTP system.

You can do fast-twitch analytical queries (sub-second, a few seconds at the tail) on complex operational data models where you are inserting millions of records per second on 100B record tables while those queries are running. Just not on Postgres.

Eikon · 6 months ago

Same for https://www.merklemap.com/

The biggest table contains 30B records. A query that uses a B-tree index completes in a few microseconds.

    EXPLAIN ANALYZE SELECT * FROM table_name WHERE id = [ID_VALUE];

    Index Scan using table_name_pkey on table_name  (cost=0.71..2.93 rows=1 width=32) (actual time=0.042..0.042 rows=0 loops=1)
     Index Cond: (id =  '[ID_VALUE]'::bigint)
    Planning Time: 0.056 ms
    Execution Time: 0.052 ms

crazygringo · 6 months ago

These are two totally different use cases.

Analytics generally means a full table scan or similar for each query, because you're e.g. taking an average of every value.

You're talking about indexes which are to optimize retrieving a handful (or few thousand) of rows.

bhouston · 6 months ago

Real time means a user can load a web page and see a query results without significant wait. So less than a few seconds.

sgarland · 6 months ago

> Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time

Analytics isn't typically something that needs real-time capabilities, for one.

> a rapidly growing table [emphasis mine]

I think I see part of the problem here. If you had a single table, that means it's completely denormalized, so your schema probably looked something like this (or wider):

    CREATE TABLE UserEvent (
      id UUID PRIMARY KEY,
      user_id UUID NOT NULL,
      user_ip_address TEXT NOT NULL,
      user_agent TEXT NOT NULL,
      event_data JSONB,
      created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
      event_type TEXT
    );

    CREATE INDEX UserEvent_user_id_idx ON UserEvent (user_id);
    CREATE INDEX UserEvent_created_at_idx ON UserEvent (created_at);

The JSON blob might be anywhere from a few hundred bytes to well over a kilobyte, and probably duplicates data already present as a scalar, like IP address, user agent string, timestamp, etc. I'll use the middle ground and say the JSONB objects are on average 500 bytes when stored. Now, the rest.

A UUID, if stored as its native type (or BINARY(16) in MySQL - don't sleep on this, MySQL folks; it makes a huge difference at scale) is 16 bytes. That's double the size of a BIGINT, and quadruple the size of an INT4. Also, unless you're using UUIDv7 (or UUIDv1, but no one does), it's not k-sortable. Since Postgres doesn't cluster tuples around the PK [yes, I know all indices in Postgres are technically secondary] like MySQL/InnoDB does, this doesn't immediately thrash the B+tree in the same way, but it does thrash the visibility map, and it does bloat the WAL. There are various arguments for why you shouldn't use a monotonic integer as a surrogate key, but IMO they're largely overblown, and there are workarounds to not publicly disclose it.

IPv4 addresses, stored in dotted-quad as a string, are a maximum of 15 characters, storing in 16 bytes as TEXT or VARCHAR. If stored instead in the Postgres native INET type, that drops to 7 bytes, plus you get built-in validation. If you had INT4 UNSIGNED available (as MySQL does natively), you could even store them in their numeric form and save another 3 bytes, though you lose the validation.

User Agent strings are huge, usually over 100 bytes. They're also not that unique, relatively speaking. Even if you need to know the patch version of the browser, anyone with a browser doing automatic updates is going to stay more or less in sync. The point is this could easily be a lookup table, with either a SMALLINT (2^15 - 1 maximum values, or 2^16 - 1 if you use unsigned values; possible with an extension in Postgres) or an INT (2^31 -1 maximum values) as the PK.

Not going to touch on JSON objects because the things you might want to know are endless. TOAST and de-TOAST can be slow; if you need low latency, you should normalize your data.

There may or may not be extracted scalars, which can be beneficial during queries. Again, though, lookup tables (or even native ENUM types, if the values are limited) are crucial at scale.

As it stands, the table will have an average row size of 664 bytes (assuming an average of 12 bytes stored for the IP, 100 bytes stored for the UA, 500 bytes stored for the JSONB, and 12 bytes stored for the event type). That's 332 GB for 500,000,000 rows. You could shave a couple of bytes off by aligning columns [0], which saves 1 GB. If the IP addresses and UA strings were lookup tables, each with an INT4, that saves 104 bytes per row. If you made the PK for the table a BIGINT, that saves another 8 bytes per row. The total savings between column alignment and basic normalization is 114 bytes per row, or 57 GB.

This doesn't touch on the indices, either. If you're using PG 13+, you get B+tree de-duplication [1] for free, which can help with some denormalized data, but not if you have anything with high cardinality, like a timestamp, or a UUID. With lookup tables, you would of course need to index those FKs (whether or not you're enforcing constraints), which adds some size, but is still a huge net gain.

> I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.

And shifting your entire analytics workload isn't a lot of work? Between ROLLUP [2] and MATERIALIZED VIEW [3], which can automatically refresh itself with a cron, this doesn't seem that burdensome.

> Also BigQuery bill for https://web3dsurvey.com is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data.

Then you're in the free tier (<= 1 TiB/month of processed data), because after that it's $6.25/TiB. Also worth noting there is a massive difference between millions of rows and billions of rows. The former can be handled by practically any RDBMS on any hardware, with a completely unoptimized schema. The latter requires some thought if you want it to be performant.

This isn't at all to say that specialized DBs don't have their place, because they absolutely do. If you need a KV store, use a KV store, not an RDBMS. If you need OLAP, use something designed for OLAP. The difference is scale. At startup or side project scale, you can easily do everything (including pub/sub) with an RDBMS, and if you put thought into its design and usage, you can take it a lot farther than you'd think. Eventually, you may hit a point where it's counter-productive to do so, and then you should look into breaking tasks out.

The issue I see happening time and time again is devs have little to no expertise in DBs of any kind, but since everyone says "Postgres is all you need," they decide to use it for everything, except they don't know what they're doing. If you do that, yeah, you're gonna have problems fairly early on, and then you'll either throw your hands up and decide you really need a bevy of specialized DBs, caches, and message queues (which introduces a lot of complexity), or you'll vertically scale the DB. If you choose the latter, by the time you hit scaling limits, you're easily spending $25K/month on the DB alone. If you opt to hire someone with DB expertise at this point, you'll spend about that if not more in personnel costs, and not only will it take them weeks if not months to unravel everything, your devs will be constantly complaining that queries are now "too complex" because they have to do some JOINs, and they're being told to stop chucking everything into JSON. If instead, you took at most a week to learn some RDBMS basics by a. reading its manual front-to-back b. hands-on experience, trying things out you could almost certainly get much farther on much less.

[0]: https://www.enterprisedb.com/blog/rocks-and-sand

[1]: https://www.postgresql.org/docs/current/btree.html#BTREE-DED...

[2]: https://www.postgresql.org/docs/current/queries-table-expres...

[3]: https://www.postgresql.org/docs/current/rules-materializedvi...

cyptus · 6 months ago

same. started writing data into parquet to analyze from there with a big gain in performance and hosting costs

Sai from PeerDB/ClickHouse here. Nice to see the progress on this project! I wanted to leave a few notes:

For analytics on transactional data, it looks like you'd still need to use logical replication (https://github.com/Mooncake-Labs/pg_mooncake/issues/90). Logical replication is somewhat similar to an ETL/CDC experience, though it's more Postgres-native. Managing logical replication at a production grade isn't trivial — it's quite common for customers to use PeerDB for homogeneous replication due to performance, manageability and observability issues with logical replication.

One potential value of extensions is Postgres compatibility for your queries. However, I'm curious about the SQL coverage on columnar tables — for example, advanced constructs like partitioned tables, prepared statements, correlated subqueries, RCTEs, triggers, and more. While it seems there’s ongoing work to improve compatibility, achieving full Postgres compatibility is challenging. In some cases, you might need to fall back to Postgres (instead of DuckDB), sacrificing performance for certain queries.

The biggest challenge we faced at Citus was the constant trade-off between Postgres compatibility and performance — both are complex and constantly moving targets. This was the key reason why Citus couldn't compete with purpose-built databases like Snowflake, ClickHouse, SingleStore. While DuckDB didn’t exist when we built Citus, it's still fundamentally different from Postgres. Even though the DuckDB dialect is somewhat similar to Postgres (as is Snowflake’s), retrofitting two distinct databases — each evolving independently for world's best performance — isn't trivial.

In short, relying on ETL (logical replication) without providing full Postgres compatibility raises the question: is it better to lean on a purpose-built database with a laser focus on performance, or adopt a more generalized approach?

Anyway, I come from a different school of thought — using the right tool for the right job. That said, I love seeing all the progress and evolution in the Postgres community — Postgres will always be my first love!

zhousun · 6 months ago

Hi, Zhou From Mooncake labs here.

Love your work on PeerDB and it's inspiring the evolvement of pg_mooncake (logical replication will be the killing feature for V2)

The core idea of mooncake is to built upon open columnar format + substitutable vectorized engine, while natively integrate with Postgres:

1. For small devs, we allow the whole stack to be embedded as a Postgres extension for ease of use

2. For enterprise, our stack is also purpose-built stack similar to PeerDB + ClickHouse, not a more generalized approach

We allow a gradual transition from 1 to 2.

saisrirampur · 6 months ago

Thank you for the kind words! :)

1, makes sense.

On 2, I understand your thinking around purpose-built — but you're retrofitting an analytical database into a transactional database without fully supporting all the features (both in terms of functionality and performance) of either. It's really hard to be truly purpose-built this way. As a result, users might not get the best of both worlds.

PeerDB is different. We keep Postgres and ClickHouse separate and just move data reliably between them. Users get to query Postgres and ClickHouse in isolation and make the best of each of them.

Anyway, keep up the good work! Just wanted to share some challenges we've seen before when building an analytics extension (Citus), particularly around chasing both Postgres compatibility and performance.

theLiminator · 6 months ago

Do you believe that things like cedardb might one day remove the distinction between OLAP and OLTP dbs?

zhousun · 6 months ago

Interesting enough, the mooncake team was building SingleStore before, and so far it is the best production-ready HTAP system. One lesson I really learned is, people don't want to switch their system-of-record OLTP system.

Spoiler-alert: Mooncake will be supporting HTAP use-cases soon, and it comes with better trade-offs: keep your OLTP postgres tables as is, and mooncake adds analytics capacity to those table on up-to-date data.