Three out of five answers so far just questions the legitimacy of using noSql.
There are definitely use cases for noSQL and I clicked on this thread hoping for information and war stories about cockroach, mongo, redis, couchdb, the current state of noSql in postgres, and a few name I maybe never heard of.
Let me derail the conversation of berating this technology. I've happen to had a requirement which needs a dinamically changing dB structure (saving lots of json data from an dinamically changeable form). Which noSql db would you recommend for me?
Any pitfalls? I'm primarily looking for self hosted solutions.
Honestly Postgres is probably my first choice for storing some dynamic JSON and I have used it for exactly that in the past. In particular, I used to to build a simple time series db for reporting on data from JSON payloads that the potential to hold arbitrary data.
The OP has harped quite a bit on connection limit issues and this is a valid concern but also something that you can mitigate by using connection pooling. Geographic replication is an issue and it's one of those things where I'm not really convinced any sql or nosql db offers a really good solution. For example, if you run mongodb in replica you must take extra steps to ensure your replicas do not suffer from split brain during a network partition. And as another commenter has pointed out, mongodb's transactions are flawed. I would not trust it for anything transactional beyond single document transactions in a non-replicated configuration.
DynamoDB is pretty decent for a document DB but I personally dislike working with it. It requires you to make many sacrifices that RDBMS's like postgres offer out of the box. For example, if you want to fetch all records you will have to do it in a loop because there are limits to the number of records that can be fetched at the same time. But of course you can't self host it.
Another hosted option is Datatomic. I've heard great things but have never used it so I can't really comment.
HN can be fairly rubbish at times. People here can get too tied to their tools to even consider that other tools may be useful to others. NoSQL has it uses and I personally love using one.
OP, try https://www.arangodb.com
It is the the best NoSQL IMHO. Its multi model, extremely performant, has fantastic distributed/replication capabilities and good documentation. They even have a hosted offering of it.
yep. the hosted option is fairly new and they are still finding their feet.
Arangodb has so many amazing stuff about it; AQL which is a query language which looks like code, Foxx which as integrated web server and so on.
This probably won't be a very popular opinion since it's not really a nosql database but if you have a relatively small dataset and not very complex querying needs you may be able to use Redis as a pretty decent datasource. Another solution may be Firebase Realtime Database although that will limit you vendor wise.
While Redis needs to keep the dataset in memory (which is why I added that it kind of depends on the size of your stuff) it does have quite robust persistence features [0] so it's very unlikely you'll ever lose your data even across reboots or crashes if it's configured correctly.
That being said, it's still not really a database engine on itself and would also require a slight paradigm change on how you think about your data and how you create your schema so ymmv. But I've personally used it across a few non-data-heavy projects as primary datasource and have been quite happy with it. It was also famously used as primary datasource for a well known adult website generating 200M pageviews/day even back in 2012 [1] [2] although I don't know if that is still the case.
First determine whether Nosql is really the solution you want. Next once you think nosql is the solution you want, have your experienced old hands slap you a few times.
If that still doesn’t convince you, then you may actually may need nosql: go for Mongo.
What is your use case? SQL RDBMS are generally a sensible default and you should use NoSQL only in places where they cannot be used (this is mostly related to scale requirements that are too much for RDBMS to handle).
I'm looking for the best DB for a serverless backend (cloud functions).
Typically the problem with RDBMS is that it's very expensive to handle thousands of concurrent connections. NoSQL doesn't have that issue. For example FaunaDB is designed for serverless and has no practical connection limits, Mongo Atlas gives you 500 concurrent connections on the free tier [1], etc.
In comparison Postgres on Heroku only gives you 500 connections on the most expensive plans. Even the $50 per month Postgres plan only gives you 50 concurrent connections.
Huh. A lot of us use serverless relational databases now, if by “serverless” you mean “runs on a remote server someone else manages.” AWS RDS, for example.
A pain to distribute geographically? What do you think big enterprises and banks use? Oracle or Mongo? If by “a pain” you mean “not free” then you’re right. Depends on how valuable your data is and how much you care about integrity.
There are definitely use cases for noSQL and I clicked on this thread hoping for information and war stories about cockroach, mongo, redis, couchdb, the current state of noSql in postgres, and a few name I maybe never heard of.
Let me derail the conversation of berating this technology. I've happen to had a requirement which needs a dinamically changing dB structure (saving lots of json data from an dinamically changeable form). Which noSql db would you recommend for me? Any pitfalls? I'm primarily looking for self hosted solutions.
The OP has harped quite a bit on connection limit issues and this is a valid concern but also something that you can mitigate by using connection pooling. Geographic replication is an issue and it's one of those things where I'm not really convinced any sql or nosql db offers a really good solution. For example, if you run mongodb in replica you must take extra steps to ensure your replicas do not suffer from split brain during a network partition. And as another commenter has pointed out, mongodb's transactions are flawed. I would not trust it for anything transactional beyond single document transactions in a non-replicated configuration.
DynamoDB is pretty decent for a document DB but I personally dislike working with it. It requires you to make many sacrifices that RDBMS's like postgres offer out of the box. For example, if you want to fetch all records you will have to do it in a loop because there are limits to the number of records that can be fetched at the same time. But of course you can't self host it.
Another hosted option is Datatomic. I've heard great things but have never used it so I can't really comment.
Can you do connection pooling to Postgres from cloud functions?
I know there is a Node driver that does it for MySQL [1] but I've never seen one for Postgres.
[1] https://github.com/jeremydaly/serverless-mysql
https://twitter.com/jepsen_io/status/1261276984681754625
OP, try https://www.arangodb.com It is the the best NoSQL IMHO. Its multi model, extremely performant, has fantastic distributed/replication capabilities and good documentation. They even have a hosted offering of it.
It looks good but their cloud offering seems quite expensive starting at $0.20 per hour or about $150 per month.
Firestore is better than the RTDB but still very limited compared to say Mongo or Fauna.
That being said, it's still not really a database engine on itself and would also require a slight paradigm change on how you think about your data and how you create your schema so ymmv. But I've personally used it across a few non-data-heavy projects as primary datasource and have been quite happy with it. It was also famously used as primary datasource for a well known adult website generating 200M pageviews/day even back in 2012 [1] [2] although I don't know if that is still the case.
[0] https://redis.io/topics/persistence
[1] http://highscalability.com/blog/2012/4/2/youporn-targeting-2...
[2] https://news.ycombinator.com/item?id=3597891
If that still doesn’t convince you, then you may actually may need nosql: go for Mongo.
https://twitter.com/jepsen_io/status/1261276984681754625
Typically the problem with RDBMS is that it's very expensive to handle thousands of concurrent connections. NoSQL doesn't have that issue. For example FaunaDB is designed for serverless and has no practical connection limits, Mongo Atlas gives you 500 concurrent connections on the free tier [1], etc.
In comparison Postgres on Heroku only gives you 500 connections on the most expensive plans. Even the $50 per month Postgres plan only gives you 50 concurrent connections.
[1] https://docs.atlas.mongodb.com/reference/atlas-limits/
A pain to distribute geographically? What do you think big enterprises and banks use? Oracle or Mongo? If by “a pain” you mean “not free” then you’re right. Depends on how valuable your data is and how much you care about integrity.