I used both NoSQL and SQL databases. Proper replication topologies (e.g. multi-master configurations) and JSON columns killed the NoSQL movement. I'm glad the NoSQL movement existed and forced the SQL camp into out-of-the-box thinking, but I wouldn't go for any NoSQL solution today on a greenfield project.
In what scenarios does one work on data without any relations?
Honest question - perhaps I'm too used to think in SQL but even with e.g. a completely flat bunch of documents one usually needs at least access control (i.e. document owners, which is a relation).
I had a project where I just needed to be saving a constant flow of data (id + 100 bytes records) somewhere, no persistence needed, no sharding (single machine) - but lots of it. I first turned to Redis and it worked nicely, but since I had PostgreSQL running, I made a performance test with it too. To my surprise, once I optimized both (memory only, pg WAL disabled, persistent connections,...) there was almost no difference in insert rate, so I ended up using PgSQL only.
This is anecdotal, of course, and it is possible that I simply didn't optimize Redis enough (though I did give it my best!). Let me also say that Redis is great otherwise, it was just in this specific case that PostgreSQL surprised me and fulfilled the need itself.
There are plenty of modern, distributed RDBMSes that make sharding transparent to the user (E.g. cockroachDB, yugabyte, vitess, many cloud offerings, etc.). Most NoSQL databases end up adding transactions because they are important, and thus the scale advantages for NoSQL systems over relational databases are diminishing, if remaining at all.
As a former Oracle DBA, and general skeptic of most of the use cases the NoSQL ends up getting chosen for, I’ll gladly say that document databases are pretty good at storing documents. I’ve seen them used very well in content management applications, and the document-database-as-a-materialized-view use case can make a lot of sense. Just like graph databases can be very good at storing graphs. As far as the bandwagoning goes though, I think Mongo has taught a lot of people about the strengths of the relational model.
If you just need a key-value store, a database like MongoDB is rather oversized. Redis or and a bunch of object storage system (now that they actually include locking) are much easier to operate.
As soon as you're back to needing complex queries (with or without relations, Postgres it is again.
I'm less sure about this. One advantage of NoSQL is that the serverless offerings blow away any SQL DB-as-a-service offering that I've seen. I think that's because NoSQL databases give you minimal, but easy-to-reason-about scaling properties whereas SQL databases do more for you, but at the cost of greatly increased complexity. I think pretty highly of AWS Aurora Serverless and the UX is still garbage in comparison to S3 or DynamoDB.
Now if you start trying to add transactions to a key-value store based architecture, NoSQL gets pretty ugly, but that's misusing the tool IMO.
Most of my projects still rely on NoSQL DBs (usually S3) because they have really excellent UX and I have plenty of data where I don't need relational properties.
Frankly it sounds like you just don't need a relational database. Their whole selling point is the strong guarantees they provide about your data, and if you aren't using those then you're basically just writing files to disk in a very convoluted way.
Is there a proper replication topology that guarantees all writes survive single node failures e.g. in a 3 node cluster and and reads also as expected? i.e. any sequence of reads and writes execute correctly in any single node failure scenario?
No SQL database has the anywhere near the search capabilities of Elasticsearch/Lucene. I also wouldn't choose Postgres for storing logs, either, but I would use a document store (i.e. ELK stack).
ClickHouse suits well for string logs, especially if a log message have some fixed structure which can be mapped to separate columns. And most logs have some number of predefined fields which are easy to map to columns (e. g. timestamp, IP, request time, e. t. c.). You can store free-form JSON which doesn't have per-defined schema in a string column, but ELK would be probably better in this case.
And then you can query logs using all power of SQL.
MongoDB and Postgres have very different strengths and weaknesses. It sounds like they didn't consider the needs of their read and write patterns before they decided to go with MongoDB and then blamed it for not being a good fit for their use case. The article doesn't give any insight at all into the cause of the performance issues, just that they happened. It could've been that they were doing something inefficiently. They also don't give any reasons for the performance increase when switching to Postgres. It could've been anything like they were doing application layer joins with Mongo and things got faster when switching to a relational database because they could do joins in one process without serially fetching data over the network. They also touch on schema on read vs schema on write but instead of talking about the tradeoffs and use cases of both patterns, they just claim schema on read is bad.
The article doesn't give any insight into the cause of the performance issue because they coudn't work out what that cause was:
In another instance we noticed degraded performance of our applications and managed to trace it to our MongoDB cluster. However, upon further inspection we were unable to find the actual cause of the problem. No matter what metrics we installed, tools we used or commands we ran we couldn’t find the cause.
Is there a good table charting the unique strengths and weaknesses of databases (or database types), so that people that are looking to build things but don't have the background could choose the right tool for the job?
If not, that may make a popular HN post!
Cassandra, CouchDB or Mongo? Redis? SQLite? Mysql/Maria/Percona or Postgres? Cockroach? Timescale or Influx? Why in the world do all these tools exist?
I learned a framework several years ago and its default db was MongoDB. So I worked with MongoDB for a few years. Then at my job I used PostgreSQL for a several months. I will probably never use MongoDB again.
I think postgres is a way more robust solution than mongo, but I've had my worst db experiences having to deal with a postgres db with almost all the business logic in triggers. That was a huge pain in the ass to maintain and understand.
Mongo is defnitely a limited solution, but for certain cases it just works pretty well, with less complexity than postgres. At one of my jobs, I've setupped and maintained a Mongo cluster that did its job without causing any fuss for years.
"I don't know what was wrong" articles like this come up really often with Mongo. My team manages thousands of Mongo nodes and we have issues, but few are directly due to Mongo. Mongo's documentation could be friendlier to the operator/maintainer, however the tools for debugging Mongo tend to be sufficient (metric counters, various status/state dumps, general Linux tooling).
You need to understand databases folks. There is no way around it. "Porting to Postgres" does not replace understanding of database internals and figuring out how to debug bad access patterns and explain why they are bad. Blindly migrating data seems like one of the worst things you could ever do - you're not fixing the problem that you don't understand how databases work. I get it - ain't nobody got time for that - but it will come back and bite you with your new shiny fancy tool too if you aren't careful
Anyone who brings up something like master-master hopefully realizes they are talking about a completely different problem now with completely different constraints. Expecting the same access pattern will perform just as well in master-master is wishful thinking - it requires careful design to avoid coordination problems. The meat of the story with databases is: coordination is expensive, distributed coordination is even more expensive
I'm starting to realize all the people complaining about data loss in Mongodb just didn't understand the tradeoffs between weak consistency and strong consistency.
There were many questions asking about recent experience with MongoDB. I have some recent experience so wanted to share with everyone.
My background:
2007-2010: Used MySQL to run a social network for 50M+ users. Would send 2B messages on peak days. All powered by Mysql.
2015-2020: Used Mongo to power a top 5k site. Zero downtime and dataloss in 5 years. Used Postgres for internal non user facing database
So I have familiarity with all of these databases at a decent scale. I am going to list pros and cons of each and why I would use. Again a tool is a tool. Just because I love a wrench, doesn't mean that I am going to use it instead of a hammer when I need to put some nails.
MongoDB:
They have come a long way since I started using them in 2012. If your use case is CRUD, you would be fine. There is still a lot of marketing fluff about sharding, multi document transactions etc which might not be fully reliable. I stay away from those things. But if you just use CRUD and you data can fit in a single db you are going to be fine at decent scale.
Following are some reasons I prefer Mongodb. It's possible that Postgres/MySQL has some of them even though I couldn't find them:
1. I don't like to manage schema migrations. During development you constantly add columns and it's a pain to make sure that this column is added to the sql databases on development, testing, staging, prod etc. With mongo, you just add a column and you are done.
2. No downtime on migrations: You can change you server type or mongo version with zero downtime. Did that since 6 years. Couldn't figure out how to do that with postgres (even with Aurora).
3. If you data structure fits document and subdocument schema, mongo is really easy to use. I tried jsonb with postgres and didn't find it as easy to use at mongo. If you data fits this paradigm, you can essentially get atomicity for multiple updates because they are all in a single document.
4. The admin interface of MongoDB Atlas is really development friendly. You can easily add read only replicas. You can easily add replicas in different regions. I shudder at the thought of managing Postgres myself and even Amazon Aurora is not as easy to admin as Atlas.
At this point I would consider Postgres only in the following case:
If my data structure is like a non-tree graph. In that case I would expect to do multiple complex join and expect transaction consistency between tables.
If I am building a financial product which directly handles money or money instruments (bank, stock trading) I would definitely not use mongo for that.
> With mongo, you just add a column and you are done.
No, usually you are not done. Your data doesn't have the column. So either you write your code in a way that it can handle the column being optional or you write code to read each document and add the column manually.
I've started out with mongodb on Fit Analytics originally, until we noticed at some point that our data was an inconsistent mess and we'd implement constraints in our back-end code to make sure the data is consistent. Eventually it became just easier to have an SQL database taking care of all those constraints, and we luckily migrated away.
Thanks for raising this good point. I didn't get a chance to elaborate on this in my original post because it was getting too long and it was end of a long day.
The cost of adding a column is zero, in terms of schema migration. So that helps with developer velocity. We used an ORM (mongoose), so that we can have added constraints including default values. But Mongo did help us avoid the friction of adding columns.
Because you didn't put strict validation before data entered Mongo and I guess you didn't use a typed lang which makes such a validation like a no-brainer and quite natural.
SQL gives you type checking for free, Mongo does not. However, if you have a typed lang and have a proper validation in place you just have to define those at one place and not in the database and in your app (DRY).
This thread is like a piñata that gets pulled out every 6 months or so take a wack at MongoDB and talk about how everyone should just use PostgreSQL for every database problem no matter whether it's relational or not.
No-one cares about whether MongoDB is useful or not.
If you're using replica set, as you should for anything other than development (and even in development, sometimes you need a replica set for some features), you just sequentially upgrade each one. It goes down, clients figure out the new topology and continue working.
I'm also running on MongoDB, top 6k site and while I wouldn't say there weren't any issues, uptime is wonderful, I sleep well, and have zero problems with modeling the domain as a document collections instead of relational. Can easily imagine that it would not be great fit for many other use cases. Not using Atlas as GP, so it's more work, but I'm trying to be scrappy.
> If I am building a financial product which directly handles money or money instruments (bank, stock trading) I would definitely not use mongo for that.
My friend used to work for citi bank, his team managed high value transactions (in billions). I was _stunned_ to hear that they were storing all their data in MongoDB.
I am not sure what is the exact use case. They could be using it for a non-critical data warehouse which could easily be recreated.
If you can deterministically determine your use case and are confident that mongo will fit it - it might be a good idea.
My concern which mongo is that is if you push the feature set, it will start falling apart at the seams.
Even though Postgres/MySql have some issues which can reduce the velocity of your team, if you absolutely want to be sure about data integrity and want full flexibility of evolving use case, they might serve better than mongo.
> During development you constantly add columns and it's a pain to make sure that this column is added to the sql databases on development, testing, staging, prod etc. With mongo, you just add a column and you are done.
I meant mongo support for sharing might be marketing fluff. As in they promise it works but it might be unreliable.
I have come to trust Mongo with CRUD on a document. Rest of the features might not work as well. Looks at the recent Jespen test for mongo ( https://jepsen.io/analyses/mongodb-4.2.6 )
Just did some POCs with Hasura, a service on top of PG that provides:
* GraphQL interface (I think even a REST one in the latest release)
* authorization (role based)
* API gateway (put other API behind your GraphQL interface)
* a UI for managing the schema
* a way to do mirgations
Now GraphQL provides a schema (that includes the types of of the values), that can be used to generate clients libs in many languages. This effectively provides me a type safe interface + authorization to my db, while I have not programmed anything yet. Slowly SPAs start to make sense (if there barely is any self written server-side code running).
While Hasura is nice, it's too much blackbox, too much all-in, too much risk, we decided against it.
2021 is great because of TypeScript, types and decorators (just put here any other typed lang with decoraters if you do not like TS). If your db is your leading system for your schema, everything else is derived and "code-generated" with all the restrictions SQL has. Which is not great if you want to decorate and enhance. Yes, you could use interfaces but it's not the same, you can't decorate code-generated type definitions but your architecture wouldn't stay DRY either.
If you think TS' types as the leading schema you do not need a type checked db like Postgres anymore, anything which can store data works just fine you have all schema definitions at one place which is not the DB. Good examples enabling such an architecture are TypeORM and TypeGraphQL. However most stick to SQL and 'my schema must be in my DB, there is no other way to enforce a schema' (but there is!) you just need a good architecture (types and decorators!) and not another mainstream-from-the-shelf-solution.
Interesting. My db and it's types have whole migration procedures to ensure all works on all environments, Hasura also does this.
How would that work with just-TS? The schema changes but the documents in my KV-store (I assume that what you mean by "no types on the storage") have not changed accordingly. How to prevent this obvious trouble?
I've been using hasura in production for about a year now and am very happy with it.
Little recommendation, use their typescript metadata sdk instead of the client ui if you want to be able to build your metadata as part of your ci pipeline.
Honest question - perhaps I'm too used to think in SQL but even with e.g. a completely flat bunch of documents one usually needs at least access control (i.e. document owners, which is a relation).
This is anecdotal, of course, and it is possible that I simply didn't optimize Redis enough (though I did give it my best!). Let me also say that Redis is great otherwise, it was just in this specific case that PostgreSQL surprised me and fulfilled the need itself.
As soon as you're back to needing complex queries (with or without relations, Postgres it is again.
- schema on write: You know what future queries you will ask. Use SQL.
- schema on read: You don't know what future queries you will ask. Use object storage.
Of course, in practice, part of your data will be schema on write and part on read.
Now if you start trying to add transactions to a key-value store based architecture, NoSQL gets pretty ugly, but that's misusing the tool IMO.
Most of my projects still rely on NoSQL DBs (usually S3) because they have really excellent UX and I have plenty of data where I don't need relational properties.
- S3
- Redis/Memcached
- Elasticsearch
aren't really NoSQL databases in the Marketing sense since they fulfil such a specific purpose.
And then you can query logs using all power of SQL.
In another instance we noticed degraded performance of our applications and managed to trace it to our MongoDB cluster. However, upon further inspection we were unable to find the actual cause of the problem. No matter what metrics we installed, tools we used or commands we ran we couldn’t find the cause.
If not, that may make a popular HN post!
Cassandra, CouchDB or Mongo? Redis? SQLite? Mysql/Maria/Percona or Postgres? Cockroach? Timescale or Influx? Why in the world do all these tools exist?
I think postgres is a way more robust solution than mongo, but I've had my worst db experiences having to deal with a postgres db with almost all the business logic in triggers. That was a huge pain in the ass to maintain and understand.
Mongo is defnitely a limited solution, but for certain cases it just works pretty well, with less complexity than postgres. At one of my jobs, I've setupped and maintained a Mongo cluster that did its job without causing any fuss for years.
You need to understand databases folks. There is no way around it. "Porting to Postgres" does not replace understanding of database internals and figuring out how to debug bad access patterns and explain why they are bad. Blindly migrating data seems like one of the worst things you could ever do - you're not fixing the problem that you don't understand how databases work. I get it - ain't nobody got time for that - but it will come back and bite you with your new shiny fancy tool too if you aren't careful
Anyone who brings up something like master-master hopefully realizes they are talking about a completely different problem now with completely different constraints. Expecting the same access pattern will perform just as well in master-master is wishful thinking - it requires careful design to avoid coordination problems. The meat of the story with databases is: coordination is expensive, distributed coordination is even more expensive
So, not really.
But I agree that this joke is an old one and doesn't really have a place in a constructive discussion on hackernews.
My background:
2007-2010: Used MySQL to run a social network for 50M+ users. Would send 2B messages on peak days. All powered by Mysql.
2015-2020: Used Mongo to power a top 5k site. Zero downtime and dataloss in 5 years. Used Postgres for internal non user facing database
So I have familiarity with all of these databases at a decent scale. I am going to list pros and cons of each and why I would use. Again a tool is a tool. Just because I love a wrench, doesn't mean that I am going to use it instead of a hammer when I need to put some nails.
MongoDB:
They have come a long way since I started using them in 2012. If your use case is CRUD, you would be fine. There is still a lot of marketing fluff about sharding, multi document transactions etc which might not be fully reliable. I stay away from those things. But if you just use CRUD and you data can fit in a single db you are going to be fine at decent scale.
Following are some reasons I prefer Mongodb. It's possible that Postgres/MySQL has some of them even though I couldn't find them:
1. I don't like to manage schema migrations. During development you constantly add columns and it's a pain to make sure that this column is added to the sql databases on development, testing, staging, prod etc. With mongo, you just add a column and you are done.
2. No downtime on migrations: You can change you server type or mongo version with zero downtime. Did that since 6 years. Couldn't figure out how to do that with postgres (even with Aurora).
3. If you data structure fits document and subdocument schema, mongo is really easy to use. I tried jsonb with postgres and didn't find it as easy to use at mongo. If you data fits this paradigm, you can essentially get atomicity for multiple updates because they are all in a single document.
4. The admin interface of MongoDB Atlas is really development friendly. You can easily add read only replicas. You can easily add replicas in different regions. I shudder at the thought of managing Postgres myself and even Amazon Aurora is not as easy to admin as Atlas.
At this point I would consider Postgres only in the following case: If my data structure is like a non-tree graph. In that case I would expect to do multiple complex join and expect transaction consistency between tables.
If I am building a financial product which directly handles money or money instruments (bank, stock trading) I would definitely not use mongo for that.
Happy to answer followup questions.
No, usually you are not done. Your data doesn't have the column. So either you write your code in a way that it can handle the column being optional or you write code to read each document and add the column manually.
I've started out with mongodb on Fit Analytics originally, until we noticed at some point that our data was an inconsistent mess and we'd implement constraints in our back-end code to make sure the data is consistent. Eventually it became just easier to have an SQL database taking care of all those constraints, and we luckily migrated away.
The cost of adding a column is zero, in terms of schema migration. So that helps with developer velocity. We used an ORM (mongoose), so that we can have added constraints including default values. But Mongo did help us avoid the friction of adding columns.
Because you didn't put strict validation before data entered Mongo and I guess you didn't use a typed lang which makes such a validation like a no-brainer and quite natural.
SQL gives you type checking for free, Mongo does not. However, if you have a typed lang and have a proper validation in place you just have to define those at one place and not in the database and in your app (DRY).
This thread is like a piñata that gets pulled out every 6 months or so take a wack at MongoDB and talk about how everyone should just use PostgreSQL for every database problem no matter whether it's relational or not.
No-one cares about whether MongoDB is useful or not.
Cudos for having master class control of the word. My favorite punch bag are play stores, root of all evil.
A list of cycling subjects that split opinions about 50-50 in this news group would be hilarious.
In the end it is a tool which fits some scenarios.
How so? All the documentation on upgrading say you need to shutdown each instance to then step up to the next minor level.
For instance 3.4 -> 3.6 -> 4.0 etc.
https://docs.mongodb.com/manual/release-notes/4.0-upgrade-st...
I'm also running on MongoDB, top 6k site and while I wouldn't say there weren't any issues, uptime is wonderful, I sleep well, and have zero problems with modeling the domain as a document collections instead of relational. Can easily imagine that it would not be great fit for many other use cases. Not using Atlas as GP, so it's more work, but I'm trying to be scrappy.
My friend used to work for citi bank, his team managed high value transactions (in billions). I was _stunned_ to hear that they were storing all their data in MongoDB.
If you can deterministically determine your use case and are confident that mongo will fit it - it might be a good idea.
My concern which mongo is that is if you push the feature set, it will start falling apart at the seams.
Even though Postgres/MySql have some issues which can reduce the velocity of your team, if you absolutely want to be sure about data integrity and want full flexibility of evolving use case, they might serve better than mongo.
I have found liquibase (https://www.liquibase.org/) to be extremely helpful for this.
I have come to trust Mongo with CRUD on a document. Rest of the features might not work as well. Looks at the recent Jespen test for mongo ( https://jepsen.io/analyses/mongodb-4.2.6 )
* GraphQL interface (I think even a REST one in the latest release)
* authorization (role based)
* API gateway (put other API behind your GraphQL interface)
* a UI for managing the schema
* a way to do mirgations
Now GraphQL provides a schema (that includes the types of of the values), that can be used to generate clients libs in many languages. This effectively provides me a type safe interface + authorization to my db, while I have not programmed anything yet. Slowly SPAs start to make sense (if there barely is any self written server-side code running).
2021 is great because of TypeScript, types and decorators (just put here any other typed lang with decoraters if you do not like TS). If your db is your leading system for your schema, everything else is derived and "code-generated" with all the restrictions SQL has. Which is not great if you want to decorate and enhance. Yes, you could use interfaces but it's not the same, you can't decorate code-generated type definitions but your architecture wouldn't stay DRY either.
If you think TS' types as the leading schema you do not need a type checked db like Postgres anymore, anything which can store data works just fine you have all schema definitions at one place which is not the DB. Good examples enabling such an architecture are TypeORM and TypeGraphQL. However most stick to SQL and 'my schema must be in my DB, there is no other way to enforce a schema' (but there is!) you just need a good architecture (types and decorators!) and not another mainstream-from-the-shelf-solution.
How would that work with just-TS? The schema changes but the documents in my KV-store (I assume that what you mean by "no types on the storage") have not changed accordingly. How to prevent this obvious trouble?
Little recommendation, use their typescript metadata sdk instead of the client ui if you want to be able to build your metadata as part of your ci pipeline.