After all those years, I sometimes feel like the whole ORM thing hasn't made progress or been settled. Which ORMs do you like and what do you like about them? What is the sweet spot? I'm asking this because database I/O has always been my biggest frustration when doing application programming. These days, I do more functional programming and tend to side with the ORM haters but I do occasionally wish I could just abstract out the database more. Anyways, I'd just like to hear HN's thoughts on this.
I believe in most cases the efficiencies that can be gained from using an ORM is quickly offset by a substantial dependency on third-party code, limited control of performance, and inability to express more complex SQL expressions, such as CTE's, lateral expressions, JSONB functions and operators etc.
ORM's also tend to pile on lots of functionality, of which most projects will normally only use a small fraction.
For an in-production system I've been maintaining for the last 10 years, I've recently ripped out the ORM code, replacing it with raw SQL queries, and a bit of DRY glue code. Results: less code, better performing queries, and less dependencies.
[1] https://github.com/jeremyevans/sequel
Good ORM lets you use a small slice of their features without any big performance hit in terms of development or actual. With a nice object model it just seems easier to remember and work with code.
When I need complex SQL I just write a raw query... It is trivial and concise in any ORM to drop down to raw SQL when needed.
I can agree they are no silver bullet, but a good ORM does so many chores for me I can't imagine bothering to do them myself, anymore when the pattern of using objects in my code and storing them in a SQL DB solves the problem at hand.
I'd like to hear more of your thoughts on that.
My experiences with ORMs are less about too much abstraction, and more about them encouraging wrong abstractions. I'm yet to work with an ORM-using project that doesn't employ ORM objects as business model objects. The ones I've worked with always exploded due to the fact that even with ORM, database model doesn't map 1:1 to how the business logic would like to see the world - and the tension of trying to use one set of objects for both purposes at the same time is what messed up the codebase.
These days it seems that even for the most rudimentary programming tasks everybody reaches straightaway for their favorite framework/lib without too much thinking.
I think in a way all these frameworks and libraries have had a negative impact on the general quality of coding. Programmers end up with limited knowledge of how the underlying technologies work: the HTTP protocol, SQL, the DOM...
And because these frameworks tend to grow over time, in an effort to be comprehensive and serve everyone's needs, they introduce more and more dependencies, more and more tooling, more and more code, which you'll need to get to know sooner or later.
Programmers who use said frameworks and libs become more and more "hooked", start using more features than actually needed (do you really need that auto generated REST API for that 3-page dry cleaning website?), and in the process become, well, dumber.
You can of course take this argument to absurdity by claiming that everybody should just write machine code. That's of course not what I mean. I just think that for many programming tasks, programmers would be better off - and no less productive - just by attacking the problem head on, with a good understanding of the technologies involved.
Sometimes a simple handsaw is more efficient and less hassle to use than that fancy sophisticated table saw you got in the corner.
You are right, very little of the ORM gets used in the end and one almost always needs to write custom SQL anyway (models don't always map 100% to app layer). Most importanly, as languages change, evolve or dry up it's easier to find or train new talent with SQL experience vs. a language specific ORM.
Had seen that article a while ago. Saw a few other mentions of it already in this thread, but here are a few I googled for and found:
http://blogs.tedneward.com/post/the-vietnam-of-computer-scie... (the original, maybe)
HN thread:
Object-Relational Mapping is the Vietnam of Computer Science (2006) (codinghorror.com) https://news.ycombinator.com/item?id=7310077
https://blog.jooq.org/2012/05/09/martin-fowler-on-the-vietna...
Some argue that using an ORM means you can switch underlying database technologies on a whim. I think this is an incredibly weak argument. How often do people truly switch database technologies?
I created a small wrapper around the node postgres library to make querying a little easier.
Have a look at https://github.com/joeandaverde/tinypg - It's a no frills library that makes it easy to execute SQL files as prepared statements and pass parameters as keys on objects.
https://gist.github.com/ciconia/57ad430cf437f0548d596a6f0026...
- Avoids mistakes when dealing with writing raw SQL queries (SQL is quite repetitive in practice)
- The declarative nature of classes maps well to types and relationships
- The declarative nature of classes maps out well to tables, even with polymorphism [1]
- Keeping "Models" in an ORM often maps out well to migration utility (Alembic, Django Migrations)
- Object-chaining map very well to queries
- ORM objects can be reused and composed
- They can abstract out intricacies across SQL dialects
- They can potentially make it easier to migrate to different SQL servers if no specialized features were used
- Can help avoid common security vulnerabilities like SQL injections
- When something can't be expressed via ORM relationships, they tend to allow the dev to drop down to raw SQL. In the case of SQLAlchemy, there is a core query language [2], too.
- In the case of Django, QuerySet is used as a standard throughout extensions that power a whole community. Plugins that don't even know each other (e.g. django-filter and django-tables2) can operate on the same django queryset to filter/search and sort/display data.
I mention QuerySet/Django ORM quite a bit in a recent blog post at https://www.git-pull.com/code_explorer/django-vs-flask.html.
[1] http://docs.sqlalchemy.org/en/latest/orm/inheritance.html [2] http://docs.sqlalchemy.org/en/latest/core/
This isn't remotely true; it turns what looks like an in-memory access into a network round trip. Navigating your database using an idiom of lists and member accesses is a fast path to n+1 queries all over the place, or ORM tuned to eagerly fetch the whole world when you only touch a tiny piece of it.
The closer an ORM's API is to a monad, the happier you'll be. Fundamentally, accessing the database is executing remote code; the more you can package up into the query before it goes off and does anything, the better performance you'll see.
IMO trying to shoehorn objects (in the OO sense, with polymorphism, data hiding and behaviour) into a database is wrong-headed. Data hiding in particular is a wrong mental model for thinking about facts in the database, and the more is hidden, the harder it will be to reason about performance and bulk remote operations generally.
That isn't true for e.g. Django's ORM, which lazily evaluates the query and only actually accesses the db after that, with a single query, and filtering done in SQL.
The queries that most tend to be making just aren't that sophisticated. Relationships tend to be basic.
And I haven't even mentioned stuff that'd really, really hard to express/manage in pure SQL like tree/nested information that has to stay balanced [1]. Thanks django-treebeard/mptt and sqlalchemy-orm-tree.
> The closer an ORM's API is to a monad, the happier you'll be.
Developers using ORM's simply aren't caring about ORM's matching a certain construct. They care that models emit correct representations of their schemas and that the data is retrieved "fast enough".
Take it a different way: the best part about ORM's? They're effective 95% of the time, right out the box, so you end up avoiding time that'd be spent over and prematurely optimizing.
> IMO trying to shoehorn objects (in the OO sense, with polymorphism, data hiding and behaviour) into a database is wrong-headed.
Objects in things like SQLAlchemy declarative and Django Models map perfectly to generated SQL, so they also act as a way to generate migrations. It's that accurate. A lot of the relationships project's need expressed tend to be vanilla joins.
> Data hiding in particular is a wrong mental model for thinking about facts in the database, and the more is hidden, the harder it will be to reason about performance and bulk remote operations generally.
ORM's strive to hit a value sweet-spot in terms of code expressiveness, reducing duplication, handling the bread and butter relationships and types. That covers what most developers really need.
Perhaps there are projects out there not fitting to ORM's. Not all projects are sophisticated data mart projects, but even then, a good share of those still go back to simple joins at the end of the day.
And I've even gone as far as trusting heavy-duty stuff like django-mptt, along with plugins that filter and sort. I don't even look at the queries, all I see is they're running performantly. In all these years, SQL queries have never been a bottleneck. Maybe it's because I'm only storing simple stuff.
[1] https://en.wikipedia.org/wiki/Nested_set_model
Also, Django ORM documentation is very clear and you can use Django Debug Toolbar to analyse the raw sql generated.
Deleted Comment
Only if you use naive ways of doing this. In .NET at least your overcomplicated expression can be compiled down to the minimum query needed to pull the bits of data you use.
If you define your schema in your database and derive your data layer from that, you get everything in your list (apart from that thing that totally always happens where you switch your underlying database technology every few months).
But then you don't have your database defined in two places. And if anybody ever does modify the db by hand, your build will break and it will quickly surface itself as an issue at compile time instead of via an obscure error message somewhere 40 levels deep in the call stack.
Compared to something like Hibernate or SQLAlchemy that tries to support everything under the sun and can result in a lot confusion when trying to understand what exactly it's doing.
Deleted Comment
I never understood the ORM hate. Every place I worked we intermingled raw SQL when needed. Hibernate has a way to clear query caches so you can use raw SQL when you need to. You can just write raw SQL exclusively if you want within hibernate so I don't get how you could lose anything :) .
Still, my experience is mostly with Hibernate. It's extremely mature, meaning reliable, feature complete, and only 0-30% slower than raw queries in most cases. It makes adding support for things like multitenency and audited tables a breeze without losing database agnostic behavior. It makes database upgrades a no-brainer 95% of the time too. It has a built in memory caches that help enormously with common queries. Probably the biggest thing is it makes the database strongly typed so it's possible to refactor. Code refactoring in Java is easy but raw stringified SQL is nearly impossible to fix in any language.
I think the biggest counterpoint to ORM is shitty ORM. Things like SQLAlchemy generate downright horrific SQL slowing everything to a crawl and causing deadlocks. Another honest counterpoint to ORM is the learning curve. Everyone is taught SQL but the ORM is more abstract and harder to reason about, not a fun thing to learn.
TBH I think most ORM's are just poorly done. Putting an object abstraction on a relational database is hard. The only ones I've enjoyed for completeness and performance are Hibernate and to some extent Entity Framework. EF being the easiest to use but a bit slower with less features.
I have heard good things about Dapper but never used it. I like the idea of injecting a SQL DSL into the language itself, wish it was more prevalent.
Not my experience... Would you care to explain ? SQLA works pretty well for me, as long as I stay in rather simple queries. For example, data analysis queries are next to impossible to express with SQLA, but that doesn't matter much.
For the simple update queries, it just works fine for me. It also allows for very good control on relationships, etc.
First, the need for data has moved from the server to the client side in API-centric architectures. The backend is often just a dumb HTTP layer on top of a (sometimes relational, sometimes not) data persistence. Heck, I even built apps without server, using PostgREST.
Second, on the client side, we don't need an object-oriented interface to the datastore, because JavaScript. But we do need a powerful declarative query builder, and static data types to build up on. GraphQL provides that.
Third, if the purpose of the ORM was ever to abstract the SQL dialect of the database, GraphQL has resolvers, which turn a GraphQL request into a request for the database of your choice. So it does that job, too.
Last, for the data crunching jobs that you decide to do on the backend, you often need specialized data stores (event store, document store, full-text search index), and ORMs don't address that. And since these jobs are often resource intensive, you need the best performance - therefore falling back to raw SQL in most cases.
ORMs were a good solution for the kind of software architectures that were used for the web in the 00s. Since then, we have different needs, which require different solutions.
Disclaimer: I was the maintainer of a popular open-source PHP ORM [1] for 3 years. I no longer use ORMs.
[1] http://propelorm.org/
To my opinion, it was a bad solution to the wrong problem.
For one, we're not that enamoured with objects anymore (what with functional programming, immutability, etc).
Second, SQL and DDL, being declarative, is both a higher abstraction that (at least) most ORMs, and offers more fine level control to the DB at the same time!
Third, people don't really switch databases that often, for the abstraction between different SQL syntaxes to matter.
But third-party libraries shouldn't assume any specific SQL database, so if you want libraries that can do database things, ORMs are very useful.
"ORM is the Vietnam of Computer Science" (2006)
https://blog.codinghorror.com/object-relational-mapping-is-t...
"ORMs are a thing of the past" (2009) http://codemonkeyism.com/orms/
"Be careful with magical code" (2010) "[...] ORMs and in particular Hibernate [...]" http://codemonkeyism.com/beware-magical-code/
No ORM comes remotely close to being able to solve this kind of problem. The kinds of things I need to consider:
- analyzing select / predicate / order / etc. for tables used and selectively including tables
- forcing query evaluation order by creating a subtable query with conditions, projecting out an id, and rejoining in the next query up
- analyzing predicates and rewriting predicate trees to push predicates down into subtable queries, or convert predicates into joins, or replace references to columns on other tables with denormalized columns
- gathering heuristics about query set size and changing query generation technique
- splitting queries into parallelizable queries and composing result set client-side
An ORM doesn't really help me. Even a library tuned for symbolic manipulation of SQL doesn't help hugely; my life is made much simpler when I can freely extend the SQL AST classes with my own analytic and rewrite methods.
With a suitably tweaked ORM, you can pass platform-specific hints down to the DB. The platform's query optimiser should be (obviously up for debate!) the one that's taking a query and working out the most efficient way of executing it.
Not saying it always will always pick the best execution, but if you're getting to the point of supplanting the optimiser you're starting to replace the database itself.
At which point you're probably beyonds the realms of what an ORM can and should reasonably doing, you're starting to writing your own custom persistence platform.
If you have just a predicate, or a single join, most query planners are just about ok; with multiple tables and predicates touching those tables, there's ever increasing risk it will start the join on the wrong table and produce too many rows somewhere in the plan. Engines like PostgreSQL are unpredictably stupid (but sometimes very smart), while MySQL is predictably stupid (but consistently fast when you keep it simple and avoid specific constructs).
When designing for interactive use, you want to design for a maximum run time on the query. If that means using a query which is often not the fastest, but is never slower than X, it's the query to use. Smart planners are more of a hindrance than a help, because they reduce predictability.
The worst case is when a planner starts with a small row set (planners usually like starting with small row sets) and joins against a big table expecting a similarly small joined set, but ends up with an enormous result set because of the cardinality of the join. Estimating join cardinality is a tough problem for database vendors to solve, and it's critical to figuring out the correct join order - particularly since the number of join strategies grows with the factorial of the number of tables to join, so the planner can't spend much time on it. Meanwhile, the application developer may very well know the designed cardinalities and a join sequence that minimizes the chance of a blowup, the most important thing for interactive use.