Goodbye MongoDB, Hello PostgreSQL (2015)

anticristi · 5 years ago

I used both NoSQL and SQL databases. Proper replication topologies (e.g. multi-master configurations) and JSON columns killed the NoSQL movement. I'm glad the NoSQL movement existed and forced the SQL camp into out-of-the-box thinking, but I wouldn't go for any NoSQL solution today on a greenfield project.

menzoic · 5 years ago

What if you didn't need relational queries and was dealing with a large volume of data that needed to be sharded?

bildung · 5 years ago

In what scenarios does one work on data without any relations?

Honest question - perhaps I'm too used to think in SQL but even with e.g. a completely flat bunch of documents one usually needs at least access control (i.e. document owners, which is a relation).

amenod · 5 years ago

I had a project where I just needed to be saving a constant flow of data (id + 100 bytes records) somewhere, no persistence needed, no sharding (single machine) - but lots of it. I first turned to Redis and it worked nicely, but since I had PostgreSQL running, I made a performance test with it too. To my surprise, once I optimized both (memory only, pg WAL disabled, persistent connections,...) there was almost no difference in insert rate, so I ended up using PgSQL only.

This is anecdotal, of course, and it is possible that I simply didn't optimize Redis enough (though I did give it my best!). Let me also say that Redis is great otherwise, it was just in this specific case that PostgreSQL surprised me and fulfilled the need itself.

srich36 · 5 years ago

There are plenty of modern, distributed RDBMSes that make sharding transparent to the user (E.g. cockroachDB, yugabyte, vitess, many cloud offerings, etc.). Most NoSQL databases end up adding transactions because they are important, and thus the scale advantages for NoSQL systems over relational databases are diminishing, if remaining at all.

AmericanChopper · 5 years ago

As a former Oracle DBA, and general skeptic of most of the use cases the NoSQL ends up getting chosen for, I’ll gladly say that document databases are pretty good at storing documents. I’ve seen them used very well in content management applications, and the document-database-as-a-materialized-view use case can make a lot of sense. Just like graph databases can be very good at storing graphs. As far as the bandwagoning goes though, I think Mongo has taught a lot of people about the strengths of the relational model.

sascha_sl · 5 years ago

If you just need a key-value store, a database like MongoDB is rather oversized. Redis or and a bunch of object storage system (now that they actually include locking) are much easier to operate.

As soon as you're back to needing complex queries (with or without relations, Postgres it is again.

anticristi · 5 years ago

To me, "warm" (not hot) data falls into one of two boxes:

- schema on write: You know what future queries you will ask. Use SQL.

- schema on read: You don't know what future queries you will ask. Use object storage.

Of course, in practice, part of your data will be schema on write and part on read.

solidasparagus · 5 years ago

I'm less sure about this. One advantage of NoSQL is that the serverless offerings blow away any SQL DB-as-a-service offering that I've seen. I think that's because NoSQL databases give you minimal, but easy-to-reason-about scaling properties whereas SQL databases do more for you, but at the cost of greatly increased complexity. I think pretty highly of AWS Aurora Serverless and the UX is still garbage in comparison to S3 or DynamoDB.

Now if you start trying to add transactions to a key-value store based architecture, NoSQL gets pretty ugly, but that's misusing the tool IMO.

Most of my projects still rely on NoSQL DBs (usually S3) because they have really excellent UX and I have plenty of data where I don't need relational properties.

remus · 5 years ago

Frankly it sounds like you just don't need a relational database. Their whole selling point is the strong guarantees they provide about your data, and if you aren't using those then you're basically just writing files to disk in a very convoluted way.

valzam · 5 years ago

I think when most people say NoSQL they don't mean S3 (even though it is like a database). In general I think

- S3

- Redis/Memcached

- Elasticsearch

aren't really NoSQL databases in the Marketing sense since they fulfil such a specific purpose.

YZF · 5 years ago

Is there a proper replication topology that guarantees all writes survive single node failures e.g. in a 3 node cluster and and reads also as expected? i.e. any sequence of reads and writes execute correctly in any single node failure scenario?

collyw · 5 years ago

I am stuck maintaining a mongo monstrosity.

schoolornot · 5 years ago

Postgres doesn't have multi-master. That's the problem.

dijit · 5 years ago

Pedantically; it does, but you pay for it: https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndqua...

vlod · 5 years ago

I believe CockroachDB can do that. Would be a suitable as it's mostly postgres wire compatible.

psychometry · 5 years ago

No SQL database has the anywhere near the search capabilities of Elasticsearch/Lucene. I also wouldn't choose Postgres for storing logs, either, but I would use a document store (i.e. ELK stack).

citrin_ru · 5 years ago

ClickHouse suits well for string logs, especially if a log message have some fixed structure which can be mapped to separate columns. And most logs have some number of predefined fields which are easy to map to columns (e. g. timestamp, IP, request time, e. t. c.). You can store free-form JSON which doesn't have per-defined schema in a string column, but ELK would be probably better in this case.

And then you can query logs using all power of SQL.

menzoic · 5 years ago

MongoDB and Postgres have very different strengths and weaknesses. It sounds like they didn't consider the needs of their read and write patterns before they decided to go with MongoDB and then blamed it for not being a good fit for their use case. The article doesn't give any insight at all into the cause of the performance issues, just that they happened. It could've been that they were doing something inefficiently. They also don't give any reasons for the performance increase when switching to Postgres. It could've been anything like they were doing application layer joins with Mongo and things got faster when switching to a relational database because they could do joins in one process without serially fetching data over the network. They also touch on schema on read vs schema on write but instead of talking about the tradeoffs and use cases of both patterns, they just claim schema on read is bad.

chris_wot · 5 years ago

The article doesn't give any insight into the cause of the performance issue because they coudn't work out what that cause was:

In another instance we noticed degraded performance of our applications and managed to trace it to our MongoDB cluster. However, upon further inspection we were unable to find the actual cause of the problem. No matter what metrics we installed, tools we used or commands we ran we couldn’t find the cause.

menzoic · 5 years ago

True, but it seems like they blamed it on MongoDB just being bad without justification and proposed Postgres as the solution.

sam_goody · 5 years ago

Is there a good table charting the unique strengths and weaknesses of databases (or database types), so that people that are looking to build things but don't have the background could choose the right tool for the job?

If not, that may make a popular HN post!

Cassandra, CouchDB or Mongo? Redis? SQLite? Mysql/Maria/Percona or Postgres? Cockroach? Timescale or Influx? Why in the world do all these tools exist?

anonytrary · 5 years ago

I learned a framework several years ago and its default db was MongoDB. So I worked with MongoDB for a few years. Then at my job I used PostgreSQL for a several months. I will probably never use MongoDB again.

ekvilibrist · 5 years ago

I'm sorry you had to go through that.

pietrovismara · 5 years ago

Oh please, quit the bullshit elitism.

I think postgres is a way more robust solution than mongo, but I've had my worst db experiences having to deal with a postgres db with almost all the business logic in triggers. That was a huge pain in the ass to maintain and understand.

Mongo is defnitely a limited solution, but for certain cases it just works pretty well, with less complexity than postgres. At one of my jobs, I've setupped and maintained a Mongo cluster that did its job without causing any fuss for years.

jb3689 · 5 years ago

"I don't know what was wrong" articles like this come up really often with Mongo. My team manages thousands of Mongo nodes and we have issues, but few are directly due to Mongo. Mongo's documentation could be friendlier to the operator/maintainer, however the tools for debugging Mongo tend to be sufficient (metric counters, various status/state dumps, general Linux tooling).

You need to understand databases folks. There is no way around it. "Porting to Postgres" does not replace understanding of database internals and figuring out how to debug bad access patterns and explain why they are bad. Blindly migrating data seems like one of the worst things you could ever do - you're not fixing the problem that you don't understand how databases work. I get it - ain't nobody got time for that - but it will come back and bite you with your new shiny fancy tool too if you aren't careful

Anyone who brings up something like master-master hopefully realizes they are talking about a completely different problem now with completely different constraints. Expecting the same access pattern will perform just as well in master-master is wishful thinking - it requires careful design to avoid coordination problems. The meat of the story with databases is: coordination is expensive, distributed coordination is even more expensive

soulchild37 · 5 years ago

MongoDB is like snapchat for database

menzoic · 5 years ago

I'm starting to realize all the people complaining about data loss in Mongodb just didn't understand the tradeoffs between weak consistency and strong consistency.

threeseed · 5 years ago

Would've been funnier and timely about a decade ago.

twic · 5 years ago

MongoDB would have been timely about a decade ago.

dijit · 5 years ago

Make a joke similar to this a decade ago (as I did) and developers would jump on you for being "old" or "not getting it".

So, not really.

But I agree that this joke is an old one and doesn't really have a place in a constructive discussion on hackernews.

isubasinghe · 5 years ago

this was funny, but I think its a lot better now. Imo rolling their own consensus protocol was a terrible idea, but they are using Raft now I believe.

technosamay20 · 5 years ago

There were many questions asking about recent experience with MongoDB. I have some recent experience so wanted to share with everyone.

My background:

2007-2010: Used MySQL to run a social network for 50M+ users. Would send 2B messages on peak days. All powered by Mysql.

2015-2020: Used Mongo to power a top 5k site. Zero downtime and dataloss in 5 years. Used Postgres for internal non user facing database

So I have familiarity with all of these databases at a decent scale. I am going to list pros and cons of each and why I would use. Again a tool is a tool. Just because I love a wrench, doesn't mean that I am going to use it instead of a hammer when I need to put some nails.

MongoDB:

They have come a long way since I started using them in 2012. If your use case is CRUD, you would be fine. There is still a lot of marketing fluff about sharding, multi document transactions etc which might not be fully reliable. I stay away from those things. But if you just use CRUD and you data can fit in a single db you are going to be fine at decent scale.

Following are some reasons I prefer Mongodb. It's possible that Postgres/MySQL has some of them even though I couldn't find them:

1. I don't like to manage schema migrations. During development you constantly add columns and it's a pain to make sure that this column is added to the sql databases on development, testing, staging, prod etc. With mongo, you just add a column and you are done.

2. No downtime on migrations: You can change you server type or mongo version with zero downtime. Did that since 6 years. Couldn't figure out how to do that with postgres (even with Aurora).

3. If you data structure fits document and subdocument schema, mongo is really easy to use. I tried jsonb with postgres and didn't find it as easy to use at mongo. If you data fits this paradigm, you can essentially get atomicity for multiple updates because they are all in a single document.

4. The admin interface of MongoDB Atlas is really development friendly. You can easily add read only replicas. You can easily add replicas in different regions. I shudder at the thought of managing Postgres myself and even Amazon Aurora is not as easy to admin as Atlas.

At this point I would consider Postgres only in the following case: If my data structure is like a non-tree graph. In that case I would expect to do multiple complex join and expect transaction consistency between tables.

If I am building a financial product which directly handles money or money instruments (bank, stock trading) I would definitely not use mongo for that.

Happy to answer followup questions.

Jean-Philipe · 5 years ago

> With mongo, you just add a column and you are done.

No, usually you are not done. Your data doesn't have the column. So either you write your code in a way that it can handle the column being optional or you write code to read each document and add the column manually.

I've started out with mongodb on Fit Analytics originally, until we noticed at some point that our data was an inconsistent mess and we'd implement constraints in our back-end code to make sure the data is consistent. Eventually it became just easier to have an SQL database taking care of all those constraints, and we luckily migrated away.

timewarrior · 5 years ago

Thanks for raising this good point. I didn't get a chance to elaborate on this in my original post because it was getting too long and it was end of a long day.

The cost of adding a column is zero, in terms of schema migration. So that helps with developer velocity. We used an ORM (mongoose), so that we can have added constraints including default values. But Mongo did help us avoid the friction of adding columns.

fhunt · 5 years ago

> our data was an inconsistent

Because you didn't put strict validation before data entered Mongo and I guess you didn't use a typed lang which makes such a validation like a no-brainer and quite natural.

SQL gives you type checking for free, Mongo does not. However, if you have a typed lang and have a proper validation in place you just have to define those at one place and not in the database and in your app (DRY).

threeseed · 5 years ago

> Happy to answer followup questions.

This thread is like a piñata that gets pulled out every 6 months or so take a wack at MongoDB and talk about how everyone should just use PostgreSQL for every database problem no matter whether it's relational or not.

No-one cares about whether MongoDB is useful or not.

tommilukkarinen · 5 years ago

"This thread is like a piñata"

Cudos for having master class control of the word. My favorite punch bag are play stores, root of all evil.

A list of cycling subjects that split opinions about 50-50 in this news group would be hilarious.

timewarrior · 5 years ago

I have been watching these threads forever. I personally like Mongo and wanted to added some real life perspective to it.

In the end it is a tool which fits some scenarios.

jackdh · 5 years ago

>You can change you server type or mongo version with zero downtime.

How so? All the documentation on upgrading say you need to shutdown each instance to then step up to the next minor level.

For instance 3.4 -> 3.6 -> 4.0 etc.

https://docs.mongodb.com/manual/release-notes/4.0-upgrade-st...

synthmeat · 5 years ago

If you're using replica set, as you should for anything other than development (and even in development, sometimes you need a replica set for some features), you just sequentially upgrade each one. It goes down, clients figure out the new topology and continue working.

I'm also running on MongoDB, top 6k site and while I wouldn't say there weren't any issues, uptime is wonderful, I sleep well, and have zero problems with modeling the domain as a document collections instead of relational. Can easily imagine that it would not be great fit for many other use cases. Not using Atlas as GP, so it's more work, but I'm trying to be scrappy.

girishso · 5 years ago

> If I am building a financial product which directly handles money or money instruments (bank, stock trading) I would definitely not use mongo for that.

My friend used to work for citi bank, his team managed high value transactions (in billions). I was _stunned_ to hear that they were storing all their data in MongoDB.

timewarrior · 5 years ago

I am not sure what is the exact use case. They could be using it for a non-critical data warehouse which could easily be recreated.

If you can deterministically determine your use case and are confident that mongo will fit it - it might be a good idea.

My concern which mongo is that is if you push the feature set, it will start falling apart at the seams.

Even though Postgres/MySql have some issues which can reduce the velocity of your team, if you absolutely want to be sure about data integrity and want full flexibility of evolving use case, they might serve better than mongo.

creshal · 5 years ago

Citibank is a leading expert in losing your money, MongoDB is a perfect fit for their business model.

thesimon · 5 years ago

> During development you constantly add columns and it's a pain to make sure that this column is added to the sql databases on development, testing, staging, prod etc. With mongo, you just add a column and you are done.

I have found liquibase (https://www.liquibase.org/) to be extremely helpful for this.

timewarrior · 5 years ago

Thanks, I will take a look at it. It looks really expensive though!

menzoic · 5 years ago

Why do you consider sharding to be marketing fluff? If your dataset was big enough wouldn't that be a requirement for your database?

timewarrior · 5 years ago

I meant mongo support for sharing might be marketing fluff. As in they promise it works but it might be unreliable.

I have come to trust Mongo with CRUD on a document. Rest of the features might not work as well. Looks at the recent Jespen test for mongo ( https://jepsen.io/analyses/mongodb-4.2.6 )

cies · 5 years ago

Just did some POCs with Hasura, a service on top of PG that provides:

* GraphQL interface (I think even a REST one in the latest release)

* authorization (role based)

* API gateway (put other API behind your GraphQL interface)

* a UI for managing the schema

* a way to do mirgations

Now GraphQL provides a schema (that includes the types of of the values), that can be used to generate clients libs in many languages. This effectively provides me a type safe interface + authorization to my db, while I have not programmed anything yet. Slowly SPAs start to make sense (if there barely is any self written server-side code running).

fhunt · 5 years ago

While Hasura is nice, it's too much blackbox, too much all-in, too much risk, we decided against it.

2021 is great because of TypeScript, types and decorators (just put here any other typed lang with decoraters if you do not like TS). If your db is your leading system for your schema, everything else is derived and "code-generated" with all the restrictions SQL has. Which is not great if you want to decorate and enhance. Yes, you could use interfaces but it's not the same, you can't decorate code-generated type definitions but your architecture wouldn't stay DRY either.

If you think TS' types as the leading schema you do not need a type checked db like Postgres anymore, anything which can store data works just fine you have all schema definitions at one place which is not the DB. Good examples enabling such an architecture are TypeORM and TypeGraphQL. However most stick to SQL and 'my schema must be in my DB, there is no other way to enforce a schema' (but there is!) you just need a good architecture (types and decorators!) and not another mainstream-from-the-shelf-solution.

cies · 5 years ago

Interesting. My db and it's types have whole migration procedures to ensure all works on all environments, Hasura also does this.

How would that work with just-TS? The schema changes but the documents in my KV-store (I assume that what you mean by "no types on the storage") have not changed accordingly. How to prevent this obvious trouble?

diveanon · 5 years ago

I've been using hasura in production for about a year now and am very happy with it.

Little recommendation, use their typescript metadata sdk instead of the client ui if you want to be able to build your metadata as part of your ci pipeline.

_ozde · 5 years ago

This is from 2015. I wonder if it is the same in 2021. Anyone running high throughput/low latency services using MongoDB?