No. Every time I've used mongodb we've ended up regretting it for one reason or another. And migrating to a different database after launch is a huge hassle.
I've done a couple projects where we kicked off with postgres using JSONB columns for early iteration. Then we gradually migrated to normal SQL columns as the product matured and our design decisions crystallized. That gave us basically all the benefits of mongodb but with a very smooth journey toward classical database semantics as we locked down features and scaled.
Back in 2010 when the MongoDB hype was high, the well-known company where I was working at the time decided to build the next version of the product using MongoDB. I was on the analytics team and had to code a whole bunch of intricate map-reduce jobs to extract summary data out of Mongo. I'd repeatedly head to the product team and ask them to explain the edge cases I was seeing in the data and they would not be able to give me an answer because the data was on the third or fourth version of their mentally-stored schema and no one knew anymore. All in all, misery.
I decided to check out the hype back then, and started writing tutorials on using PHP with MongoDB. After the 3rd posting, I realized that they were all about being anti relational, and though you you could have keys to other records in your record. This lead to bringing back records, querying more records, then manually filtering records,rinse and repeat.
Iirc, they've turned around on their anti relational views and now allow for joins.
I looked at OrientDB awhile back ago, but it fell flat with the lack of features and oddness.
If I had more time, I would really dig into ArangoDB.
I've recently adopted this practice. All new features use a JSONB (or hstore - I'm experimenting with both) field instead of a "real" one until they're bedded in and stop changing. Then I convert the field + data to a real field with a NOT NULL constraint in one easy migration.
So far so good. Being able to join on JSONB fields right in the SQL is awesome
I never understood the appeal of the JSON to SQL columns workflow.
At least with the ORM(-ish) tools I worked with, it always felt much more straightforward to just change classes within the application code and automatically generate the respective migrations files to be run on the relational database, than having to interact as a human with json (for the app as well as for business intelligence and reporting/monitoring).
I feel I have handwritten significantly more schema and data migration code for non-sql databases and JSONB in Postgres than for relational databases in the last 10 years.
Sure once the database growths bigger, those automatically generate migrations files don't work as seamlessly anymore and can be dangerous, but no tool or database magically solves all the problems at scale.
> At least with the ORM(-ish) tools I worked with, it always felt much more straightforward to just change classes within the application code and automatically generate the respective migrations...
There's your answer, no? You've got specific tools and workflows you've designed to work with your database in a specific way. You can't just take one workflow and substitute a piece of another workflow and expect it to always work. Your tools and processes would be as useless for me as mine would be for you.
> I never understood the appeal of the JSON to SQL columns workflow.
The appeal is non-technical.
Writing good migrations, having tests around them to ensure they didn't leave the DB in an inconsistent state if a subset of them failed requires good understanding of a RDBMS and the specific product.
You'll be surprised how many engineers don't meet that criteria.
The JSON to SQL columns workflow allows any developer to offload what a RDBMS does into the client/server/bootstrap code.
At that point the DB is like a key-value store and you get to claim you are using Postgresql.
Is that what I would use if I had reliability and performance in mind?
No... but it's way cheaper (and quicker) to find devs who can do this (JSON to SQL columns workflow) than devs who can write well tested, reliable and non hacky migrations.
Often at a startup, reliability and performance have lower priority than getting some features out the door in a day.
The JSON-to-columns approach is a best practice for analytic applications. ClickHouse has a feature called materialized columns that allows you to do this cheaply. You can add new columns that are computed from JSON on existing rows and materialized for new rows.
If we were going to start from scratch today, we'd probably use Postgres. But, realistically, the primary motivation behind that decision would be because Postgres is available on AWS, and that would centralize more of our operations. (DocumentDB is, of course available. Its not Mongo. I'd be curious to hear from people who actually had Mongo deployments and were able to move to DocumentDB; its missing so many of MongoDB's APIs that we physically can't, our applications would not run).
Mongo isn't that bad. It has limitations. You work within the limitations... or you don't. But I really don't think a valid option is "mongodb fuckin sucks m8, shit tier db". We're not going to be migrating terabytes of data and tens of thousands of lines of code when the benefit is tenuous for our business domain.
Should you use MongoDB today? I'll say No, but not for the reasons anyone else is giving. MongoDB's Cloud Provider agreement has decimated the cloud marketplace for the database. Essentially, if you want to run a version released in the past few years (4.2+), you need to be on their first-party Atlas product. Many other parties, especially the big clouds, are on 3.6 (or have compatibility products like DocumentDB/CosmosDB which target 3.6). Atlas is great. Its fairly priced and has a great UX and operations experience. But, I don't feel comfortable about there being political reasons why I couldn't change providers if that changes. If you have business requirements which demand, say, data in a specific region, or government-class infra, or specific compliance frameworks, Atlas may not be able to meet them.
> We're not going to be migrating terabytes of data
You may have dramatically less 'real data' than mongo makes you think you do. I migrated one of our mid sized database out of mongo and into PG a couple years ago. The reduction in size was massive. One table in particular that was storing a small number of numeric fields per doc went from ~10GB to ~50MB. I wouldn't expect this with all datasets of course, but mongo's document + key storage overhead can be massive in some use cases.
This is (probably) an artifact of Mongo's schema-less nature; when you don't have tables with structure, every document you store has to detail its own schema inline.
In a relational database, you have columns with names and types, and that info is shared by all of the rows.
In Mongo, every cell has to specify its name and type, even if that layout is shared by every other cell in the document.
Mongo's way is more flexible, but it's terrible for storage efficiency.
Unless this has changed in recent years, the BSON format that Mongo uses is more or less JSON optimized for parsing speed and takes more or less as much space as storing your entire database in JSON.
JSON is a great format for simplicity and readability but as a storage format it's hard to come up with one that's more bloated.
Not OP but I think it's more about the importance of the said data, number of collections to think about and so on. Regarding your point, I would guess some index changes might have had a significant impact here.
I'll give you a reason that not many people mention not to use Mongo. Schema definition acts as a form of documentation for your database. As someone that has come into a legacy project built on Mongo, its a nightmare trying to work out the structure of the database. Especially as there is redundant copies of some data in different collections.
I actually love the idea of providers that abstract the major cloud vendors to run things like Mongo does with Atlas: you can spin up Mongo on any of the three -- boom lock-in concerns gone, and the best part, is you're both supporting the project but also have the creators for tech-support.
Regarding your question about the ability to be able to move to DocumentDB from MongoDB, we (Countly Analytics team) weren't able since several APIs in newer MongoDB releases are still not available in DocumentDB.
> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.
Then don’t use the defaults? Sql Server use to have an empty password as a default for the Sa user and it was trivial to find servers exposed on the internet with the default password. While part of the blame was MS’s, it’s always on the person who does the installation to know what they are doing.
Yes. We have applications running on both PostgreSQL and MongoDB and I find that working with MongoDB is just more pleasant. I think it mostly boils down to my preference of document databases as opposed to relational ones. It feels much more natural to me to embed / nest certain properties within a document instead of spreading it across several tables and then joining everything together to get the complete data. MongoDB makes working with these sometimes nested documents easy (I mean, it better) and there's always Aggregation Pipeline when you need it (something that I again find much more pleasant and readable over SQL).
What always irks me is when somebody suggests PostgreSQL's json (or jsonb) types as an alternative to using MongoDB. All it's saying is that the person hasn't really invested a lot of time into MongoDB because there are things that PostgreSQL simply cannot do over a json type, especially when it comes to data updates. Or it can do that but the query is just overly complicated and often includes sub queries just to get the indexes into the arrays you want to update. All of that is simple in MongoDB, not really a surprise - that's exactly what it was made for. The last time I worked with PostreSQL's json I sometimes ended up just pulling the value out of the column entirely, modified it in memory and set the it back to the db because that was either way easier or the only way to do the operation I wanted (needless to say there are only exceptional cases where you can do that safely).
Lastly, if you can easily replace MongoDB with PostgreSQL and its json types or you're missing joins a lot (MongoDB does have left join but it's rarely needed), chances are you haven't really designed your data in a "document" oriented way and there's no reason to use MongoDB in that case.
I'm curious, any chance you remember some of those json/jsonb update hassles?
(not arguing, just curious - when things get hairy in JSON, I give up on SQL and [1] I write a user defined function (CREATE FUNCTION) in JS (plv8) or Python (plpython).
[1] assuming the update code needs to run inside the database, e.g. for performance reasons... otherwise just perform your update in the application, where you presumably have a richer library for manipulating data structures...
I don't remember the specific case (it's been a few years) but I do remember it had something to do with updating an array member. I googled around and found this [0] (the second question) which looks very similar. It's as simple as it gets - find an array member with "value" equal to "blue" and decrease its "qty". In MongoDB you can do that pretty easily and the update should be atomic. The SQL version looks complicated and it's not even the form that you should use (notice the note about the race condition). Then again, maybe there's already a way to do that in PostgreSQL in a more elegant way, I assume the support has improved over the years.
How do you handle the data integrity issues that are present, even at the strongest configurable integrity levels, or is data integrity not an issue for your application?
And if you’re using C#, the MongoLinq library makes using Mongo with Linq just as easy as using EF with an RDMS. We were able to easily support both in a product just by passing Linq expressions to different repository classes and the expressions were translated to either Sql or MongoQuery by the appropriate provider.
Yep, the update operators for MongoDB are really great and not replaceable with Postgres. Now if only MongoDB had a good sharding story it would be a worth contender for me.
No, I had a sour experience in 2009 where it ate my data, the devs were rather cavalier with "there's a warning on the download page" (I got it through apt), it ate my data again when the OOM killer killed its process.
I didn't like the project attitude of a database being so lax with persistence, so I never used it again.
I feel like MongoDB now is actually a pretty stable product simply through time and investment, however I will never trust the company for using our data to beta test for a decade.
That's my attitude as well. RethinkDB, in comparison, had a much better attitude of "reliable first, fast later". Unfortunately, it turned out that when you're a database, it doesn't matter how much data you lose, only how fast you are while losing it.
Yes, it's our main DB. I still like it quite a lot, we use Mongoose as ODM, it makes adding new stuff so much easier without having to do things like alter table etc. But for our big data stuff we use BigQuery, simple because of cost.
I do like how easy it is to get a mongo instance up and running locally. I found maintenance tasks for mongo are much easier than postgres.
One thing you still need to do is manage indexes for performance, I've had to spend many a days tuning these.
I have come across some rather frustrating issues, for example a count documents call is executed as an aggregate call, but it doesn't do projection using your filters. e.g you want to count how many times the name 'hacker' appears. It will do the search against name, then do the $count, but because it doesn't do a projection, it will read the whole document in to do this. Which is not good when the property you're searching against has an index, so it shouldn't have to read in the document at all.
Yes, use it with Atlas for every one of my companies' projects.
- The document model is a no-brainer when working with JS on the front-end. I have JSON from the client, and Dictionaries on the backend (Flask), so it's as easy as dumping into the DB via the pymongo driver. No object relational mapping.
- Can scale up/down physical hardware as needed so we only pay for what we use
I'd echo most of those sentiments, but not all:
+ Support on Atlas is good
+ Set up process is very streamlined and smooth
+ Modifications (scaling IOS etc) are all very easy, but ...
- The Atlas web client had a bug that swapped data types (fixed, but still)
- The database starts off fast, but seemed to get quite slow considering how small it was (fitted entirely in RAM)
- The latest Jepsen report suggests t is still very cavalier with data integrity (http://jepsen.io/analyses/mongodb-4.2.6)
My experience has been the same, for I only used it for a couple weeks now.
I love their web interface - anyone knows if there are any cloud offerings for PostgresSQL which are similar?
I can't help noticing that the majority - not all, but the majority - of "No." responses here summarize identically: someone used MongoDB a long time ago (between 5 and 11 years) and ran into a problem, so they stopped using it and will never try or re-evaluate it again.
I'm a bit surprised that developers and systems engineers get burned to the point that they disconnect from the daily reality of their occupation, that software is often shaky in its infancy but almost always improves over time.
Here's the thing, it's not just that early Mongo had issues, sure whatever that's life. But they argued that things like silently dropping data if your database gets too large were acceptable because it's documented (and this is just one such issue).
The sole, number one requirement I have for my database is that the data I put in is still there when I go to get it back. Failing that and pretending it's not an issue is more than "being shaky", it's a violation of the trust I put in my data-store. It's great they've fixed those issues now, but that cavalier attitude towards data the integrity of my data is what makes me hesitant to use it in future, not the fact that a bug involving data-loss existed.
None of this is to say I'd never use it, but it'd be far harder for me to trust it again vs postgres, couch, rethink, cassandra, or any number of other data-stores that took data integrity seriously from the start.
The problem with MongoDB is not that it was shaky in its infancy. It is that the developers made it very clear from the start that they don't care. Speed is everything, everything else is secondary. This changed a bit over time, but the latest Jepsen report (http://jepsen.io/analyses/mongodb-4.2.6) is still a mess and by now I'm pretty sure that is less "shaky in its infancy" and more a deep seated design problem that will never be fixed.
How often I re-evaluate applications, you mean? I would say it happens somewhat regularly; if there's something I miss with one I switched from, or if I hear or read something good about an application I stopped using.
I had to rip out a comletely schemaless mongodb and replace it with a sql DB. The data was completely relational, and developing without a schema was a pita.
I asked why mongodb was used in the first place. As far as I could tell, the answer was one resume driven dev.
If a document store made sense for our data then we would’ve made mongodb work. It was less “we got burned” and more “why were we using mongo in the first place?”
If you have a problem with product A and you have 10 other options, find product B is good enough and start working with it, there is no need to go back to A from time to time to see if it was improved. And watching all the 10 products like a horse race and move to the best of the hour is not reality.
I don't think this is limited to devs or databases. You see it sometimes in Firefox/Chrome threads along the lines of "firefox sucks", "when did you last use it?", "4 years ago".
Mongo in particular had marketing plays that were quite deceitful; that particular flavor of distate can run deeper than some tech issues.
disclaimer: I use mongo in my day to day as a primary DB store.
I've done a couple projects where we kicked off with postgres using JSONB columns for early iteration. Then we gradually migrated to normal SQL columns as the product matured and our design decisions crystallized. That gave us basically all the benefits of mongodb but with a very smooth journey toward classical database semantics as we locked down features and scaled.
Back in 2010 when the MongoDB hype was high, the well-known company where I was working at the time decided to build the next version of the product using MongoDB. I was on the analytics team and had to code a whole bunch of intricate map-reduce jobs to extract summary data out of Mongo. I'd repeatedly head to the product team and ask them to explain the edge cases I was seeing in the data and they would not be able to give me an answer because the data was on the third or fourth version of their mentally-stored schema and no one knew anymore. All in all, misery.
Iirc, they've turned around on their anti relational views and now allow for joins.
I looked at OrientDB awhile back ago, but it fell flat with the lack of features and oddness.
If I had more time, I would really dig into ArangoDB.
Since they are both of the NoSQL family.
So far so good. Being able to join on JSONB fields right in the SQL is awesome
At least with the ORM(-ish) tools I worked with, it always felt much more straightforward to just change classes within the application code and automatically generate the respective migrations files to be run on the relational database, than having to interact as a human with json (for the app as well as for business intelligence and reporting/monitoring).
I feel I have handwritten significantly more schema and data migration code for non-sql databases and JSONB in Postgres than for relational databases in the last 10 years.
Sure once the database growths bigger, those automatically generate migrations files don't work as seamlessly anymore and can be dangerous, but no tool or database magically solves all the problems at scale.
There's your answer, no? You've got specific tools and workflows you've designed to work with your database in a specific way. You can't just take one workflow and substitute a piece of another workflow and expect it to always work. Your tools and processes would be as useless for me as mine would be for you.
The appeal is non-technical.
Writing good migrations, having tests around them to ensure they didn't leave the DB in an inconsistent state if a subset of them failed requires good understanding of a RDBMS and the specific product.
You'll be surprised how many engineers don't meet that criteria.
The JSON to SQL columns workflow allows any developer to offload what a RDBMS does into the client/server/bootstrap code.
At that point the DB is like a key-value store and you get to claim you are using Postgresql.
Is that what I would use if I had reliability and performance in mind?
No... but it's way cheaper (and quicker) to find devs who can do this (JSON to SQL columns workflow) than devs who can write well tested, reliable and non hacky migrations.
Often at a startup, reliability and performance have lower priority than getting some features out the door in a day.
If we were going to start from scratch today, we'd probably use Postgres. But, realistically, the primary motivation behind that decision would be because Postgres is available on AWS, and that would centralize more of our operations. (DocumentDB is, of course available. Its not Mongo. I'd be curious to hear from people who actually had Mongo deployments and were able to move to DocumentDB; its missing so many of MongoDB's APIs that we physically can't, our applications would not run).
Mongo isn't that bad. It has limitations. You work within the limitations... or you don't. But I really don't think a valid option is "mongodb fuckin sucks m8, shit tier db". We're not going to be migrating terabytes of data and tens of thousands of lines of code when the benefit is tenuous for our business domain.
Should you use MongoDB today? I'll say No, but not for the reasons anyone else is giving. MongoDB's Cloud Provider agreement has decimated the cloud marketplace for the database. Essentially, if you want to run a version released in the past few years (4.2+), you need to be on their first-party Atlas product. Many other parties, especially the big clouds, are on 3.6 (or have compatibility products like DocumentDB/CosmosDB which target 3.6). Atlas is great. Its fairly priced and has a great UX and operations experience. But, I don't feel comfortable about there being political reasons why I couldn't change providers if that changes. If you have business requirements which demand, say, data in a specific region, or government-class infra, or specific compliance frameworks, Atlas may not be able to meet them.
You may have dramatically less 'real data' than mongo makes you think you do. I migrated one of our mid sized database out of mongo and into PG a couple years ago. The reduction in size was massive. One table in particular that was storing a small number of numeric fields per doc went from ~10GB to ~50MB. I wouldn't expect this with all datasets of course, but mongo's document + key storage overhead can be massive in some use cases.
In a relational database, you have columns with names and types, and that info is shared by all of the rows.
In Mongo, every cell has to specify its name and type, even if that layout is shared by every other cell in the document.
Mongo's way is more flexible, but it's terrible for storage efficiency.
JSON is a great format for simplicity and readability but as a storage format it's hard to come up with one that's more bloated.
> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.
>> ... even at the strongest levels of read and write concern, it failed to ...
What always irks me is when somebody suggests PostgreSQL's json (or jsonb) types as an alternative to using MongoDB. All it's saying is that the person hasn't really invested a lot of time into MongoDB because there are things that PostgreSQL simply cannot do over a json type, especially when it comes to data updates. Or it can do that but the query is just overly complicated and often includes sub queries just to get the indexes into the arrays you want to update. All of that is simple in MongoDB, not really a surprise - that's exactly what it was made for. The last time I worked with PostreSQL's json I sometimes ended up just pulling the value out of the column entirely, modified it in memory and set the it back to the db because that was either way easier or the only way to do the operation I wanted (needless to say there are only exceptional cases where you can do that safely).
Lastly, if you can easily replace MongoDB with PostgreSQL and its json types or you're missing joins a lot (MongoDB does have left join but it's rarely needed), chances are you haven't really designed your data in a "document" oriented way and there's no reason to use MongoDB in that case.
(not arguing, just curious - when things get hairy in JSON, I give up on SQL and [1] I write a user defined function (CREATE FUNCTION) in JS (plv8) or Python (plpython).
[1] assuming the update code needs to run inside the database, e.g. for performance reasons... otherwise just perform your update in the application, where you presumably have a richer library for manipulating data structures...
[0] https://dba.stackexchange.com/questions/193390/update-nth-el...
I didn't like the project attitude of a database being so lax with persistence, so I never used it again.
My answer contains a history of my experiences with MongoDB that is pretty similar to yours:
https://stackoverflow.com/a/18269939/123671
I feel like MongoDB now is actually a pretty stable product simply through time and investment, however I will never trust the company for using our data to beta test for a decade.
I do like how easy it is to get a mongo instance up and running locally. I found maintenance tasks for mongo are much easier than postgres.
One thing you still need to do is manage indexes for performance, I've had to spend many a days tuning these.
I have come across some rather frustrating issues, for example a count documents call is executed as an aggregate call, but it doesn't do projection using your filters. e.g you want to count how many times the name 'hacker' appears. It will do the search against name, then do the $count, but because it doesn't do a projection, it will read the whole document in to do this. Which is not good when the property you're searching against has an index, so it shouldn't have to read in the document at all.
- The document model is a no-brainer when working with JS on the front-end. I have JSON from the client, and Dictionaries on the backend (Flask), so it's as easy as dumping into the DB via the pymongo driver. No object relational mapping.
- Can scale up/down physical hardware as needed so we only pay for what we use
- Sharding is painfully easily, with one click
- Support has been incredible
I'm a bit surprised that developers and systems engineers get burned to the point that they disconnect from the daily reality of their occupation, that software is often shaky in its infancy but almost always improves over time.
The sole, number one requirement I have for my database is that the data I put in is still there when I go to get it back. Failing that and pretending it's not an issue is more than "being shaky", it's a violation of the trust I put in my data-store. It's great they've fixed those issues now, but that cavalier attitude towards data the integrity of my data is what makes me hesitant to use it in future, not the fact that a bug involving data-loss existed.
None of this is to say I'd never use it, but it'd be far harder for me to trust it again vs postgres, couch, rethink, cassandra, or any number of other data-stores that took data integrity seriously from the start.
You only get one chance to make a first impression - I'll probably stick with it unless I get an excellent reason to change my mind.
How often do you reevaluate technologies that you've had a bad experience with?
I asked why mongodb was used in the first place. As far as I could tell, the answer was one resume driven dev.
If a document store made sense for our data then we would’ve made mongodb work. It was less “we got burned” and more “why were we using mongo in the first place?”
Mongo in particular had marketing plays that were quite deceitful; that particular flavor of distate can run deeper than some tech issues.
disclaimer: I use mongo in my day to day as a primary DB store.
Deleted Comment