Using Postgres for Everything

antirez · a year ago

When I was doing databases, I formulated this joke-theorem of modern computing: Whatever reasonable and widespread database/language you use to model your application, if you do things sensibly, it will work well. A corollary is that: 1. It's a lot more about taste and less about technology we may believe. 2. Good programmers can bend a non perfect tool to work well for a given use case, bad programmers will mess everything even with perfect tools, so in general it is better to spend time becoming better programmers than spend time learning all the new cool tools.

This idea, if true (and I believe it is) tells us how lucky we are compared than the past programmers. Advanced databases and languages are so advanced that they can do almost everything, even if they are not good at doing everything. And hardware and RAM is so generous today, that will mask a lot of potential issues unless you are at a big scale.

Deleted Comment

SavageBeast · a year ago

And run it all on a single piece of big iron:

https://thmsmlr.com/cheap-infra

I'm not joking either.

tkiolp4 · a year ago

I agree with everything said there except one thing: high availability is difficult to achieve with a single box. So, I would use two.

ffsm8 · a year ago

I will have to disagree with your disagreement.

High availability is not difficult to achieve with a single box, it's just depending on luck

I've had boxes run with 100% availability for multiple years

hobs · a year ago

It's mostly right, but the problem lies in when the requests per second are not average. Even then most people just do some unscalable stuff on the cloud for minimum 10x the cost.

gymbeaux · a year ago

The industry really has become bloated. Now we have DevOps and Agile Coaches and Cloud Engineers and microservices and JavaScript Everywhere(TM).

It’s all bullshit. DevOps is really so complex that a medium-sized company needs an entire team?

Earlier today I was trying to set up some elaborate Postgres database backup and couldn’t get it to work so I just put the command in the cron tab and called it a day. Took 1 minute.

vineyardmike · a year ago

> DevOps is really so complex that a medium-sized company needs an entire team?

Depends on your goals, but most people probably don't need it.

> I just put the command in the cron tab and called it a day

Obviously this will lead to data loss if any data is written and then a corruption event or issue occurs in-between cron events. The road to over-complicated infrastructure is paved with mistakes and outages. The home-lab server running on a pi in my closet can use cron jobs and scripts, but at work we maintain higher standards because our data loss would cost more than a few extra days of engineer time.

Everything should be as simple as possible, but let's not pretend everyone is a chump because their engineering tolerances don't allow for data loss.

ndriscoll · a year ago

Where big iron might be something between a raspberry pi with an NVMe hat and a gaming PC.

My 6th gen i5 (4 cores) can render ~15k page views per second with database queries, or several times that serving cached pages from nginx. 150 requests/second wouldn't be noticeable if I were using it for gaming at the same time.

kgeist · a year ago

Another team's product at our company runs inside a Kubernetes cluster which consists of lots of microservices. CPU's are at 80% all the time. Redis, RabbitMQ, MySQL, Elastic Search etc. There's no number crunching, mostly CRUD with some business logic on top. It requires a separate operations team to manage it (otherwise this goldberg machine easily explodes).

A few days ago I asked our ops team, what's our rps? ~500. Out of curiosity, I ran a few benchmarks to stress test my monolithic pet project in Go with a single DB for everything and it easily handled 15k/s on my home computer, too, without any special optimizations. I understand its load is totally different from our production server, and my benchmark is most likely flawed, but the order of magnitude of difference is interesting.

I was told the production cluster also has additional 500 rps just for communication between the microservices. I suspect there's just too much overhead in their setup: communication between services, serialization of data between different DBs, etc. If it was a single modular monolith with a single DB, I suspect it could be faster and easier to maintain... Another issue is that they also sell it as an on-premise solution. And with the complexity growing, it now often fails to run on clients' infra.

vineyardmike · a year ago

> Where big iron might be something between a raspberry pi with an NVMe hat and a gaming PC.

At my last startup (with ~3 technical employees), I was talking to the CTO about our cloud spend and planning for properly scaling infra after we got our first customer. We weren't "a SaaS website", but (hand-waving) we did complex physics modeling and analytics on data streams. The (physicist) CTO had heard of k8s and cloud stuff and wanted me to investigate how we should scale our cloud. He was convinced it'd be expensive to do "all those physics calculation with 64 bit numbers".

I showed him our entire cloud - gateway, database(s), physics modeling, metrics collection, log aggregation, etc - running at 1000x estimated 1-customer load from a MacBook. I offered to set him up a personal raspberry pi to stress test before a k8s cluster.

We ended up with a sensible single medium EC2 instance running everything, with some extra stuff for fail-over. AFAIK the only change made after I left was using a cloud-vendor DB.

Deleted Comment

SavageBeast · a year ago

Thanks for the metric - I wouldn't have naively assumed that kind of through put was possible on such a setup.

westurner · a year ago

Postgres notes:

ElectricSQL, SQLedge, pglite: https://news.ycombinator.com/item?id=38690588

pgreplay, pgkit: https://news.ycombinator.com/item?id=37959945

pg_timeseries: https://news.ycombinator.com/item?id=40417347

pgvector, postgresml: https://news.ycombinator.com/item?id=37812219

cmu ottertune, pg_auto_tune, postgresqltuner,

timescaledb-tune: https://github.com/timescale/timescaledb-tune

awesome-postgres: https://github.com/dhamaniasad/awesome-postgres

samwillis · a year ago

The best things that's happening in the Postgres ecosystem right now is the separation of compute and storage. Breaking that link opens up Postgres as a generic query processing engine that can sit in font of multiple different table and storage types. It allows joining between all these different types of data. It really will enable you to use Postgres as generic database for everything.

My theory is that every new database tech starts and a new domain specific db, then over time gets merged into the generic engines to serve the 99%.

nojvek · a year ago

I failed a system design interview for saying use postgres for the whole solution. App data to support about a millions users and a job/task table to support asynchronous tasks. The throughput was a few tasks per user per day.

I feel they were expecting a more complex solution with kafka queues.

He! Interviews don't go well for saying "just use postgres"

eska · a year ago

You dodged a bullet. My company burst out laughing and applauded when I suggested we should write boring software (we’re a milling company)

the-golden-one · a year ago

I’m imagining the bell curve meme with ‘I’m going to put everything in a SQL database’ at both ends.

akulkarni · a year ago

You don't need to imagine ;-)

https://x.com/avthars/status/1788195573989806448

cpursley · a year ago

Related thread: "PostgreSQL is enough"

https://news.ycombinator.com/item?id=39273954

ZitchDog · a year ago

One thing that postgres does not solve very well is reactive UIs. Postgres does have listen/notify but it requires much more boilerplate and infrastructure to set up than Firebase/Firestore. For example I'd like to be able to run a query and get notified that the results have updated when any of the affected rows change.

samwillis · a year ago

Solving this is high on my list with PGlite (https://github.com/electric-sql/pglite). I have a bunch of iteas/thoughts that I hope to get to at some point (IVM, streaming queries, result diffs).

When PGlite (or SQLite) is used with Electric (https://electric-sql.com/), we already provide good reactive primitives. But we hope to improve this so that where possible the full queries don't have to be re-run.

SparkyMcUnicorn · a year ago

I've been very happy using Hasura on top of pg.

There are several fantastic options that remove the need to manually write resolvers, set up subscriptions, solve N+1, reinvent the wheel, etc.

jamwt · a year ago

Agree. If you want a fully reactive database (and not just hierarchy-based reactivity like Firebase) but want to keep your ACID + relational data modeling, check out https://convex.dev .

Plug notice: this is my company.

cpursley · a year ago

Check out WalEx for this, it doesn't suffer from the character limits of listen/notify: https://github.com/cpursley/walex

j45 · a year ago

Could a simplification of reactive UIs be a possible angle?

Part of me is thinking about htmx or something from alpine and wondering if it’s enough. Shoutout to livewire too.

ZitchDog · a year ago

Htmx doesn't help with this problem (knowing whether the underlying data has changed in order to update the UI). It's a common problem in any kind of collaborative app.