Readit News logoReadit News
zzzeek · 2 years ago
ORMs do much more than "write SQL". This is about 40% of the value they add.

As this argument comes up over, and over, and over, and over again, writers of the "bah ORM" club continuously thinking, well I'm not sure, that ORMs are just going to go "poof" one day? I wrote some years back the "SQL is Just As Easy as an ORM Challenge" which demonstrates maybe a few little things that ORMs do for you besides "write SQL", like persisting and loading data between classes and tables that are joined in various very common ways to represent associations between classes:

https://gist.github.com/zzzeek/5f58d007698c4a0c372edd95ab8e0...

this is why whenever someone writes one of these "just write SQL" comments, or wow here a whole blog post! wow. I just shake my head. Because this is not at all what the ORM is really getting you. Plenty of ORMs let you write raw SQL or something very close to it. The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects. Not to mention abstraction over all the other messy things the database drivers do like dealing with datatypes and stuff like that.

It looks like in this blog post, they actually implemented their own nano-ORM that stores one row and queries one table. Well great, now scale that approach up and see how much fun it is to write the same boilerplate XYZRepository / XYZPostgresqlRepository code with the same INSERT / SELECT statement over, and over again. I'd sure want to automate all that tedium. I'd want a one-to-many collection too maybe.

You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings, and still use the ORM, and still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows. But why would anyone really want to, writing SQL for CRUD is really repetitive and tedious. Computer can do that for you.

marcosdumay · 2 years ago
The one distinguishing feature of an ORM is that it drops SQL and the relational paradigm and places the developer completely within the OOP world. If you drop this, you have merely a database connector library, with much more freedom of behavior than an ORM.

Yes, since they are an notoriously bad abstraction, every ORM will give you an escape hatch for the minority of tasks that it can't abstract at all. That escape hatch is not in any way a defining feature of the system.

Now, about that extra freedom that you get from dropping the requirement that your connector is an ORM... well, neither your data at rest, the presentation to the user, the abstraction for a service client, nor your validation rules benefit from OOP. Proof of that is that OOP interfaces from all of those things precede the current popular ones, and all of them were abandoned due to fundamental dissonance between the model and the data.

The rationale for an ORM is that, even though none of the interfaces you actually want for your data is best done in OOP, somehow OOP is still the best way to integrate them so that you can reuse data and metadata. This thought is not completely without merit, but there is very little empirical motivation for it, and the tiny amount that exists is more than completely explained by the OOP hype that only started to die around a decade ago.

EDIT: Oh I saw you wrote SQL Alchemy! First, thank you for that great piece of software.

Now, SQL Alchemy does provide a lot of useful ways to postpone the object representation or even to map your values into dumb data. My comment up there is on theoretical limitations, but on practice, I do think it's the best option available on Python. (And maybe about the best option afforded, since the language is intrinsically biased into OOP.)

lelanthran · 2 years ago
Okay, you're the expert here, and I'll happily concede that I am not (and apologise in advance if I seem to be disrespectful), but ...

> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.

I think that that is the problem: mapping a relational dataset to a hierarchical dataset is the digital equivalent of pounding a square peg into a round hole.

I know you've read Ted Neward's "The Vietnam of Computer Science." (it's a short read, so search for it), and he articulated the same thoughts I had each time I had to work with an ORM in a mainstream language.

There is an impedance mismatch of sorts between relational data and hierarchical data. In cases where there is no hierarchy, you can do away with the ORM completely and turn out much easier to read code, for example with `sqlc` and `Go`.[1]

The problem with the ORM is the programming language - one in which hierarchical objects are the idiomatic way to code (all of the mainstream OO languages) is always going to require some wrangling.

> still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows.

Unless you're using `sqlc` and Go, where the automation is provided

[1] Now you may argue that sqlc is technically an ORM, but then where do we draw the line for calling something an ORM? Is it the case that any method for "generating boilerplate for mapping relational datasets to in-program structures" is going to be called an ORM? Because to my mind, the result is so different for in-program representation that they aren't the same thing.

zzzeek · 2 years ago
yah I read Neward's thing, and it was one of the main reasons I wrote SQLAlchemy in the first place, because he was just so wrong. It read like he tried to write some object relational thing and it didn't work out, so he goes off and rant rant ORMs are wrong. Kind of proving that post wrong was one of the primary goals of SQLAlchemy, really, where I sought to change the question of "impedance mismatch" and "leaky abstraction" and all that and redefined the ORM / SQL abstraction layer as *automation*, not any attempt to "hide" anything.

I mean, that was really an important point in time when there really werent ORMs that were easy to work with, there was Hibernate in a very early stage and there were overly simplistic things for Perl, so I thought it was important that this "better way" I had in mind could be put out there, before the idea that "yeah let's all avoid ORMs unconditionally" could take hold.

Yeah if your app has one table and two queries, write the SQL and marshall the data yourself. I think everyone should do that approach anyway for awhile so they know what's actually going on. But if you are writing for 1200 tables, it's just not practical. if the Go thing is also automating that and generating SQL / data marshalling boilerplate code for 1200 classes, yes that's kind of ORMish. that's a totally valid way ORMs are written and if I had multiple lives I'd probably write a stored procedure ORM that does something like that too.

dagmx · 2 years ago
I completely agree with you. Every single time someone says: “Just write your own abstraction over an sql generator ”, it eventually devolves into a full blown ORM.

I swear the majority of these opinion posts against ORMs are from people who must have worked in a badly implemented project that left them with a bad experience and they blamed the pattern rather than the technology.

One of the best bits about an ORM is making it consistent for non-db users on the team to simply grok and work with, without creating monstrous and hard to debug joins everywhere. But when badly set up it can lead to a lot of debugging spaghetti. Which is the same as can happen with SQL but I suppose people think that at least there’s one less layer to debug while ignoring the problem is actually how they got where they are and not the technology

I switched a project from manually written sql to sql alchemy on a project that’s used by multiple Oscar nominated films daily for reviews. The SQL version was gross and impossible for the team to manage because it had bloated over the years, with no nice way to detangle the statement generations and joins. SQL Alchemy made it so any one of the technical directors on the team could step in and add new functionality, without serious performance footguns. Instead of me having to clean up bad sql every year (projects would fork per film and merge at the end) to keep performance up, I could trust the ORM to do that for me.

At its worst, it was way too easy for TDs to get the raw SQL to be tens of minutes per review session by structuring their logic incorrectly, but it was so difficult to see. Switching to an ORM meant I could get the performance down to seconds per session and they couldn’t destroy the performance in subtle ways.

joaodlf · 2 years ago
> I swear the majority of these opinion posts against ORMs are from people who must have worked in a badly implemented project that left them with a bad experience and they blamed the pattern rather than the technology.

Fair. But you then proceed to detail a personal experience on the other side of the spectrum: Badly written code, without an ORM, and how it was fixed by introducing an ORM.

I think we can all agree that you can write bad code, with or without an ORM :). Not that this is entirely relevant to my post, I am simply advocating for writing more SQL, not how to write a good object mapper. That's a different beast, and I purposely kept that simple just to illustrate that it is possible to get started without too much pain.

waffletower · 2 years ago
Enough people have had bad experiences with ORMs who have decided to evolve and find alternatives. ORMs are bulky and do not provide value to many modern functional, data-first development paradigms.
hnfong · 2 years ago
> I swear the majority of these opinion posts against ORMs are from people who must have ...

"microservices" - https://news.ycombinator.com/item?id=37125636

That cleared things up, at least for me. I totally agree that if your job is to essentially write a custom, RESTful, fault tolerant, 99.999% uptime accessor for a table or two, basically you're hired to be the ORM, so you don't want to just suggest people to use a ORM framework and effectively make yourself redundant.

jpc0 · 2 years ago
> and still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows

I've found I would generally only need the handling of database drivers and query building since I'm already writing a validation layer and to add data marshalling to that is pretty trivial.

Likewise practicing YAGNI, what is the chances I need multiple database drivers for different databases, it's extremely unlikely that I'm going to be chopping and changing between different databases so I'm really only writing that code once.

I would argue to start with writing the basic SQL queries and adding an ORM later when you know you actually need it.

It's much easier to onboard someone into pure python code + a db driver vs having to onboard someone to SQLAlchemy, since it is quite a complex piece of software, necessarily complex for what it is trying to achieve but if you don't need it it's not a good fit.

jacurtis · 2 years ago
> I would argue to start with writing the basic SQL queries and adding an ORM later when you know you actually need it.

I think you missed the point of the parent comment, which is that ORM's are not about writing SQL queries (although they do that). But ORMs are about moving data around, transforming it from rows and columns into meaningful objects in the project's language and data model.

As the parent comment suggested, if you are dying to write your own SQL (which does often happen as queries get more complex and don't fit into ORM language model) then you can write raw SQL but still let the ORM do the heavy lifting so you can take advantage of those features, which is the majority justification for the ORM in the first place.

You are basically suggesting what the original post author is suggesting. The comment above provided a rebuttal to their argument and you replied by suggesting the same thing they originally rebutted to. Hence a circular argument.

ris · 2 years ago
> Likewise practicing YAGNI

This is why I have come to hate YAGNI. Nowadays when it's said what I hear is "I don't understand why I need it (yet)".

joaodlf · 2 years ago
First of all, thank you for SQLAlchemy! If I ever had to make a final choice in how I would interact with a database for a very large project that involves a considerable dev team, I would always bet on SQLAlchemy. Not that I would necessarily like all aspects of it, but when it comes to Python and SQL - “Nobody ever got fired for picking SQLAlchemy.”.

With that out of the way, despite ORMs doing much more than "just writing SQL", it is exactly on that point that I flinch: Most devs should be exposed to SQL. And if your project allows you to build around simple enough abstractions so that you aren't reinventing the wheel, you should definitely be writing SQL. Especially if you don't know SQL yet - which is the growing case of new devs coming into the job market.

You can achieve a lot with SQlAlchemy Core, a tool that I absolutely recommend, but my post is just a simple alternative to get developers to think about their approach. If that results in some devs reconsidering using "full fat" SQLAlchemy and to try SQLAlchemy Core, that's a win for me!

Your gist tries to highlight the difficulty of doing certain things without an ORM. Migrations (as just 1 example) doesn't need to be hard, simple tools like flyway, or migrate (https://github.com/golang-migrate/migrate) achieve a similar result (while also keeping you on the path of writing SQL!). Deep and complex relationships between objects also don't need to be hard - typically people approach this subject with a requirement to be very flexible in the way they want to build queries and objects, but that to me in a sign that maybe they should reconsider their business logic AND reconsider that, just maybe, their project doesn't require all that flexibility, it is fairly straightforward to extend objects and introduce some more complex representations as and when it is needed - will all of this make me write code faster? Absolutely not. That is why you have spent so much time perfecting SQLAlchemy, but then again, I am not advocating for devs to go and replace their usage of ORMs, just presenting an alternative that may or may not fit their needs for a new project + give devs the chance to learn something that the ORM might have taken away.

xp84 · 2 years ago
> writing SQL for CRUD is really repetitive and tedious

Agreed, and this is the primary reason that ORMs are a necessary tool. And possibly that it's easier to train a junior developer to use one than it is to get them to a basic level of proficiency (and security awareness!) in SQL.

That said, I think this is one of those areas where It's Complicated, because inefficient database calls due to ORM usage is one of the primary ways I see applications completely break down. For most types of apps I've seen, which are very read-heavy and which aren't doing intensive writes, roughly all your writes should probably be using the ORM (including such niceties as validations, default scopes, all that nice stuff), and if it's simple enough, your "show" actions (fetch and display one entity) may be fine as well, but every "dashboard" and "index" action (show many entities, basically things with joins) likely need to be written in SQL.

In my experience (Rails), the object instantiation cost is insane, much greater than the actual time talking to the DB, so not only do you need to write SQL, but you need to handle the data that comes back without instantiating ActiveRecord models.

This is much harder work (more specifically, it needs much greater skill and experience, and is easier to make a mess of) versus using the ORM and models, but only on apps with tiny amounts of data per request, or very low request volume, can the ORM be a serious exclusive option for this task. Unless you want to end up like an app I once was asked to help fix, where they were on the $9,000 per month Heroku postgres instance, and since that's the biggest one, they could "scale" no bigger. (Okay, this wasn't their only problem, their main one was not understanding that you don't sprinkle analytics DB writes all over the place because now the simple high-volume "read" pageviews can't be generated using just a read replica).

SPBS · 2 years ago
> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.

That's easy. It's a database mapping library. Write the query, give it an object and it fills it in or reads from it. You can do this with annotations, or struct tags in the Go world. There's no need to introduce abstractions over SQL joins which I find very off-putting, because the abstractions are never perfect and suddenly you have to learn invented concepts just because you didn't want to write a JOIN in an SQL query but rather have some clever framework introspect some classes and automagically write the JOINs for you.

IKantRead · 2 years ago
To add to what you're saying: It's not like the Object-relational impedance mismatch[0] is some great unknown property of ORMs. Since people have been putting ORMs into production designers of these systems have been well aware that you must always chose a trade-off between a full functional object system and a fully relational one.

And the mismatch has two sides. If one's answer is "just use SQL!" then you're going to have new problems dealing with the mismatch coming from the SQL side of things.

The pre-ORM solution to this was not simply to shove a bunch of SQL in your application logic (though this was done), but to have a much, much more complex set of data layer logic. I'm guessing most people today writing about eschewing ORMs in favor of pure SQL have never used: stored procedures, triggers, cascades etc. I personally do miss some of the features from that era of software, but there's a lot more complexity to the "just write SQL" approach than most people realize.

0. https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

BeefWellington · 2 years ago
While I somewhat agree that a lot of these articles are people who just don't actually try/use the full feature set of ORMs, I don't agree with the overall premise you're presenting that they really do more than write SQL for you. The other things they provide are largely just abstractions around how the queried data is returned and some additional metadata tracking of the relationships. Your estimation of those parts being 60% of the value added is probably generally wrong for most users of SQLAlchemy. Not having to consider or use another language syntax while writing your code is probably closer to 70-80% of the value for most people.

Your example gist is essentially just "ORMs excel at this one thing, anything else is worthless" and also effectively hides code ("the benefit of the library!!!") to make it a very much disingenuous comparison.

> But why would anyone really want to, writing SQL for CRUD is really repetitive and tedious. Computer can do that for you.

Performance.

I wanted to love SQLAlchemy but even in some relatively simple things it generates really asinine queries that take insane amounts of time. It's just usually not noticeably insane amounts of time at first. So you query for some data and wind up with a query that returns in 20ms when there's 20 rows in the database but falls apart when there's 200,000.

It's also bad at projecting analytics type queries onto a transactionally-normalized database, but then most ORMs are not great at that.

I had both of these issues enough times that I instead just opted to start handcrafting queries in those cases and using a single "ResultSet" type class that projects as both a SimpleNamespace and a dict.

zzzeek · 2 years ago
Would need to know what versions of SQLAlchemy you spent time with and what exactly were these "asinine" queries you refer towards. SQLAlchemy generates very good queries these days for the information its given. Years ago there was heavy use of large complicated subqueries for many cases and those days are long gone.

Also my 40% / 60% breakdown is based on the SQLAlchemy source code itself regarding what parts of the code deal with generating SQL and what parts deal with all the rest. 40% is likely a large overestimate.

TX81Z · 2 years ago
Agree, my datasets have multiple billions of rows and if I don’t know the details of the query, or have the ability to tune it, it’s utterly insufficient for my needs.

I still fail to see how anybody who actually knows sql and works with “Big Kid” datasets would use an ORM.

ansc · 2 years ago
I’m using SQLAlchemy in my job, and have worked with Python for many years. Never have I seen a good case of someone using SQLAlchemy to hydrate objects from raw SQL queries. I’ll definitely admit — I have not gone out of my way to search for it. It seems that it is a common want to do this kind of 60% benefit ORM you speak of, but it’s definitely unclear to me how to pick those parts together with the daunting (and fantastic) piece SQLAlchemy is.
taeric · 2 years ago
My only caveat to what you are saying, is that I have yet to see anything that successfully scales up to many tables. I can also count on one hand the number of products I have seen that successfully migrated between big databases in a meaningful way without a ton of ancillary rewriting in the process.

That said, I fully agree that the ORM isn't necessarily the problem. I point the blame at over eager data modelling that ignores the OLTP/OLAP divide. Or that ignores that row creation has a different contract than row editing, such that using the same object for both leads to confusion with missing fields. Heck, even different edits have different contracts, such that mapping a single "object" to a database row is often a mistake.

hot_gril · 2 years ago
> writing SQL for CRUD is really repetitive and tedious

If you think of relational DBs as just CRUD machines, an ORM makes total sense, but that's the original mistake.

audunw · 2 years ago
Is it a mistake if that's all you need?

I'm kind of on the side of avoiding ORM unless you have a clear need for it. But I've seen projects where they are very valuable. If you mainly want to register, update and delete a bunch of structured data, it's not a bad idea to put it in a dabatase. And if you do those operations a lot in some part of the application, it's not a bad idea to use an ORM there. For analysis and reports and such I would probably suggest just writing SQL directly though.

TX81Z · 2 years ago
I think that’s part of why I’m having trouble with this framing - it’s treating CRUD as the entire universe of why you’d need to connect to a database.

Some of us do very intense compute in very large datasets, and ORM are not capable in those tasks. At all.

epgui · 2 years ago
> It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects. Not to mention abstraction

That's actually the problem.

I'm sure you're aware that ORMs have this fundamental problem called the ORM impedance mismatch problem. AFAIK it remains unsolved to this day.

> You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings

This is true, but it's like saying you can disassemble a Boeing 737 to build chopsticks. It's true but completely unnecessary.

gmassman · 2 years ago
Thanks for you perspective, Mike. Completely agree that the interface between data and code should be handled by a single tool. That tool must meet some minimum complexity, because it’s solving a very hard problem! Also just want to say that my team has benefited greatly from your work on SQLAlchemy, and we appreciate you immensely!
habitue · 2 years ago
Honestly, sqlalchemy is such a different breed of ORM, whenever people slander ORMs, I'm imagining they're thinking of like Django ORM or ActiveRecord which are like the duplos to SQLalchemy's Legos.
aforwardslash · 2 years ago
> I'd sure want to automate all that tedium. I'd want a one-to-many collection too maybe.

You don't actually need an ORM for that. And some of us (I know I'm a minority) actually want to have pure data objects. Some of us (talking about myself again) actually design schema-first - as in, you use DDL and your code just consumes the definition. Some of us don't care about code knowing about relationships which are an intrinsic part of the data model.

> You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings, and still use the ORM

SqlAlchemy is an amazing piece of software, and it is relevant becaise it is actually part of the problem. You don't really have an ecosystem of database abstraction libraries in python, as you have in other languages. The options are SqlAlchemy or Django ORM. Golang? You have 50 different libraries that implement 100 ways of doing the same stuff. C#? You have probably a dozen libraries besides Entity framework that provide different level of abstractions? Java? You got low level stuff up to high level ORMS. PHP? The same.

Python? You have SqlAlchemy, Django or roll your own. Both SqlAlchemy and Django can do raw queries with little effort - but if you're doing this, arent't you discarding most of the functionality that comes with it - namely, code maintenance? If you have a big project with a bunch handwritten queries (because you can do it, and quite easily, I may add), aren't you just ignoring any of the advantages of the orm? If, when you edit a model, you need to also check handwritten sql queries, you shouldn't be using an orm in the first place.

And with SqlAlchemy and Django, there is no middle ground. You either are all in, code-first definitions, or you are in a world of pain.

That is why specialized applications tend to be rewritten in other languages. Because optimizing the data layer alone often boasts 10-100x performance increase, when compared to this kind of abstraction.

> writing SQL for CRUD is really repetitive and tedious

It is. But between that and SqlAlchemy - in most languages, there are options. Not in Python. Between orm and no orm, I'll take no orm any day of the week - at least I can understand the queries.

tru1ock · 2 years ago
I picked up some SQL knowledge through osmosis by using ActiveRecord and I do wish there was a better connect between what the ORM did and the end result. There were some tools to see what code generated what queries but it was not that intuitive and in your face like for example how you would have to deep dive frequently in your generated front end javascript and css code.

In other word I think there are some tooling left on the table that can assist in increasing SQL literacy and comprehension.

ckmar · 2 years ago
> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.

If anyone is looking for this "pure" ORM (no query builder API) in Node there is https://github.com/craigmichaelmartin/pure-orm

TX81Z · 2 years ago
I’m sorry but 90% of the time I encounter somebody who swears by an ORM I’ll figure out the reason they use it is because they didn’t know SQL to start with and didn’t commit to learning a new language.

The number of people who know SQL well and still choose an ORM seems to be very, very low in my experience.

aidos · 2 years ago
Low, maybe, but we’re out there. Sqla just saves me so much time in general. Sometimes it’s easier to drop down a level, but for 99% of my db interactions the code will be significantly shorter using sqla, so I choose to use it.
hintymad · 2 years ago
Do we know what Hibernate did wrong? It used to be very popular among Java engineers but later seemed have become an obscure technology.
akkad33 · 2 years ago
Do you know why? In our company we still use hibernate heavily in our java web app
siva7 · 2 years ago
badass mic drop. i will refer to your comment whenever i see this discussion coming up again and it will come up again as long as people are still learning the art of software development.
rtpg · 2 years ago
If you're going to end up querying all the fields and putting them into a model like this dataclass anyways... Django can do that for you. If you're going to later pick and choose what fields you query on the first load, and defer other data til later.... Django can do that for you. If you're going to have some foreign relations you want to easily query.... Django can do that for you. If you're doing a bunch of joins and are using some custom postgres extensions for certain fields and filtering... Django can help you organize the integration code cleanly.

I totally understand people having issues with Django's ORM due to the query laziness making performance tricky to reason about (since an expression might or might not trigger a query depending on the context). In some specialized cases there are some real performance hits from the model creation. But Django is very good at avoiding weird SQL issues, does a lot of things correctly the first time around, and also includes wonderful things like a migration layer.

You might have a database that is _really_ unamenable to something like an ORM (like if most of your tables don't have ID rows), but I wonder how much of the wisdom around ORMs is due to people being burned by half-baked ORMs over the years.

I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern. But I think my bias is influenced by always working on software where there are just so many columns per table that it would be way too much busywork to not use something.

noirscape · 2 years ago
The main problem I've encountered with complaints surrounding ORMs usually tend to be the result of trying to overfit the ORM in a certain way.

ORMs are, for the most part, good at the CRUD operations - that is to say, they easily translate SELECT, UPDATE, INSERT and DELETE operations between conventional class objects and database rows.

Things they usually aren't very good at are when you start trying to do things that require a lot of optimization - it's very easy to have an ORM accidentally retrieve way more data than you need or to have it access a bunch of foreignkey data too many times (in Django you can thankfully preload the latter by specifying it in a queryset). That's less an issue for basic CRUD, but is an issue if you're doing say, mass calculation and only need one column and none of the foreignkey data for speed reasons.

Basically - an ORM is good but don't let yourself feel suffocated by it. If it's not a good fit for an ORM, then don't do it in the ORM, either use SQL code to do it in the DB server or do a simpler SELECT (in the ORM) and do the complex operation in your regular application before INSERTing it back in the db (if that's a goal for the operation anyway). If it's outside of the CRUD types of DB access already, the extra maintenance overhead you get from having non-ORM database code (if you're doing the SQL approach) in the application would be there anyway, you'd just get a very slow application instead of a hard error, and the latter is easier to troubleshoot (and often fix), while with the former you need to start pulling up profiling tools.

pydry · 2 years ago
>Things they usually aren't very good at are when you start trying to do things that require a lot of optimization

I find this ends up being, like, 1 or 2% of queries. It's also very hard if not impossible to guess which queries will end up in that group.

You're better off building it with the ORM first and breaking out SQL later when you are trying to performance optimize.

There is also a small % of queries which use some feature of your database engine which the ORM won't support.

agumonkey · 2 years ago
I realized too late that objects, as of now, are not capable of synthetizing a new class/type based on joins.
jononor · 2 years ago
The most stupid issues with "active record" type ORMs is the implicit queries on member access, especially in collections - leading to the N queries problem. But in SQLAlchemy one can actually turn that off - that is, make it throw an exception when undeclared table dependencies are attempted to be accessed. This restores sanity. And one gets to keep goodies you mention, plus Alembic migrations (mentioned by another). Also one can write direct SQL too with SQLAlchemy, or use the "core" layer to keep a DSL but avoid ORM.
lijok · 2 years ago
> I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern. But I think my bias is influenced by always working on software where there are just so many columns per table that it would be way too much busywork to not use something.

You end up writing a query per usecase, rather than writing generic queries that can be stitched together however in the business logic

evantbyrne · 2 years ago
The first agency I worked at did this on their Java projects. They should have just used a fully baked ORM. Basically, they ended up creating a massive query layer in the program which contained all the different queries organized into different interfaces. To edit a simple API endpoint you would have to open like 5 different files at a minimum. And because queries were usually tailored to logic in a specific controller, they were not typically reusable. It was always a relief to go back to Django after dealing with that.
baq · 2 years ago
Have this right now. Fortunately no custom query builder. It's enough of a hell as is. Cleanup will take years.
rtpg · 2 years ago
So what happens is that you have the one query that runs for a specific page and fetches the data and the relevant fields? I could definitely see that working for many projects, at least while your objects don't have too many tiny little details to pull out of thhe DB
llanowarelves · 2 years ago
I think it ends up being hybrid, like seen in CQRS and especially with DDD. Toss in "Vertical Slice Architecture" as well. What abstractions you want can decided on a per command or query basis and it feels natural.

Deleted Comment

pjmlp · 2 years ago
That is what stored procedures are for.
rowanseymour · 2 years ago
> issues with Django's ORM due to the query laziness making performance tricky to reason about

It's infuriating that this is still not a thing you can disable (https://code.djangoproject.com/ticket/30874). Pretty much my only gripe with the Django ORM which I'm a huge fan of (and I also write lots of SQL).

Daishiman · 2 years ago
There's a library that solves this for you, thankfully https://github.com/charettes/django-seal.
cushychicken · 2 years ago
I read this title and immediately thought "...but why wouldn't you just use Django?"

Having written the sort of SQL-inline code the author talks about, then refactored the whole thing to use Django: Django's ORM solves waaaay more problems than it creates in this regard.

aforwardslash · 2 years ago
There are plenty of advantages of using a dataclass, being the most obvious the fact that behaves like a pure data object (aka it doesn't have underlying associated resources). Serialization/deserialization of data is dead simple, and a dataclass is a construct you can use as a data object when building 3-tier applications. Having pure data objects also gives way more flexibility when implementing cache strategies.

While this separation isn't common in the Django ecosystem, it is very common in enterprise application design (regardless of usage of an ORM). On complex applications, Django models are often a leaky abstraction (not only because the mentioned resource connection problem, but also issues like for relations they require the inclusion of the target model, it cannot be lazy-loaded; a good example is a nullable foreign key to an optional module that may or may not be loaded), and they actually behave like a variation of the ActiveRecord pattern, that mixes two different scopes - data and operation on data. In many cases this is ok, but in many others this is a problem.

I personally use a repository pattern, coupled with a query builder and something vaguely similar to dataclasses (its a bit more complex in the sense that the data object attribute name can be different from the database field name). It is basically an object mapper with a non-related repository class.

rtpg · 2 years ago
Yeah I can understand wanting to split out the ORM model from a separate class that holds data. I just think absolving oneself of an ORM or query builder entirely for a DB schema that doesn't (glibly) fit on a postcard feels like a good way to generate a lot of busy work.

I somewhat disagree about your point on caching. If you're working with models (that, namely, are 1:1 with DB rows) stale object problems are a reality no matter what, and having the ID be put into a pure data object generates the same issues. But these are things that are not very interesting to discuss outside of specific contexts.

I am a bit of a functional programming nerd, but I've just found that for Python stuff in particular, swimming upstream is its own bug generator relative to writing concise stuff in a very imperative fashion. Using the fat models directly is a part of that calculus for me, but YMMV and every team has different strengths.

philwelch · 2 years ago
> I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern.

One successful pattern I’ve seen treats the database as its own service layer. Service code does not send arbitrary SQL to the database—instead, all of the SQL queries are set as stored procedures in the DB. People sometimes freak out when I say this, so I should clarify that the stored procedures, table schemas and other things of that nature were version controlled and deployed with some minimal build tooling we’d developed in house.

I really liked this pattern. I think anything that you need to talk to across a network should be treated as a service in itself, with a well defined interface. This simplifies testing and monitoring as well. The big risk with an ORM, architecturally, is that you end up treating your database as a sidekick to your service code, or even a dumb data store—some shameful implementation detail your service keeps locked in the basement—when they’re capable of much more than that.

ilovetux · 2 years ago
The only problem I have with using the django ORM is that it relies on the django project structure. While there are ways to use the ORM independently, they are full of hacks and trade-offs.

Granted, this problem goes away if you are building a web app or a REST API, but if I just want an ORM for a command line application, I am using django's management command functionality which is OK, but it doesn't really scale easily.

Too · 2 years ago
Yuck indeed. The way it dictates the order of imports, forcing you to import settings before any models can be imported, is reason enough not to use it. This problem spreads to any of your other files, leaving you in the end with everything depending on being launched in a full Django context. Shame, given it’s otherwise very user friendly.
rglullis · 2 years ago
Add refactoring, migrations and testing to all the reasons you mentioned, and it quickly becomes an adapted case of Greenspun's Tenth Rule for ORMs.
pjmlp · 2 years ago
Migration tooling and SQL testing predate ORMs, it is a matter of actually caring about their existence.
boxed · 2 years ago
This is just reimplementing Djangos ORM, but badly.

ORM queries compose. That's why [Python] programmers prefers them. You can create a QuerySet in Django, and then later add a filter, and then later another filter, and then later take a slice (pagination). This is hugely important for maintainable and composable code.

Another thing that's great about Djangos ORM is that it's THIN. Very thin in fact. The entire implementation is pretty tiny and super easy to read. You can just fork it by copying the entire thing into your DB if you want.

mrj · 2 years ago
This is what I came here to say.

For example, I'm working on an project now that long ago added a "sellable things" store that used plain sql. There are many, many stores like this one, but it did some logic to figure out what items are sellable and return the set. Easy, developer happy, ticket closed.

Some time later, it was needed to have "sellable items of a specific type." Well the "sellable things" store was too much to clone so the developer simply pulled all the sellables and filtered in memory. Hey it's Go so it's fast right?

This continued for a couple years and now I'm joining a project with a p99 of >15s. It would have been a natural fit to return a query set of sellable things and the other callers could further refine it however they wanted. Now I'm looking at a ball of logic that should have been in the database and it's beginning to break down at scale.

This article is just that pattern with syntax sugar. It will lead to sadness.

joaodlf · 2 years ago
> This is just reimplementing Djangos ORM, bud badly.

I guess this is a good thing, as "reimplementing" Django's ORM is the opposite of what I wanted to do here :)

> ORM queries compose. That's why [Python] programmers prefers them. You can create a QuerySet in Django, and then later add a filter, and then later another filter, and then later take a slice (pagination). This is hugely important for maintainable and composable code.

I don't really disagree, but there are many ways to skin a cat. You can absolutely write maintainable code taking this approach. In fact, I can build highly testable, unit, functional, code following a abstraction very similar to this. The idea that "maintainable and composable code" can only be achieved by having a very opinionated approach to interacting with a database, is flimsy. I offer a contrary point of view: With the Django ORM, you are completely locked in to Django. You build around the framework, the framework never bends to your will. My approach is flexible enough to be used in a Django project, a flask project, a non web dev project, any scenario really. I want complete isolation in my business logic, which is what I try to convey just before my conclusion.

boxed · 2 years ago
Djangos ORM isn't highly opinionated. That's just wrong.

> With the Django ORM, you are completely locked in to Django

Another bit of nonsense again. You have a dependency. Sure. Just like you have a dependency on Python. But it's an open source dependency, and the ORM part is a tiny part that you can just copy paste into your own code base if you want.

Also, worrying about being "locked into" something that you depend on is madness. Where does it end? Do you worry about being "locked into" Python? Of course not.

> You build around the framework, the framework never bends to your will.

You don't actually seem to understand Django at all. It's just a few tiny Python libraries grouped together: an ORM, request/response stuff, routing, templates, forms. That's it. You do NOT need to follow the conventions. You can put all your views in urls.py. You can not use urls.py at all.

You do NOT bend to the frameworks will. That's just false. You bend to it by your own accord, don't blame anyone else on your choice.

lelandbatey · 2 years ago
Can you explain a bit more about the Django ORM being very thin and easy to read? It does seem like the Django ORM is thin (from an architecture perspective), but it doesn't seem to be small, it seems to be pretty big. Maybe I'm not understanding it though, so here's what I see:

The "ORM" part of Django seems to be everything in `django.db.models.Model`, which seems to require you to declare your Models as subclasses of the aforementioned class. Looking into that code though, it seems like the implementation supporting all this is around ~20,000 lines of Python: https://github.com/django/django/tree/main/django/db/models

That doesn't strike me as a super lightweight. For comparison, all of Flask (a Python WSGI web app framework, but mostly a 10+ year old project to compare to, and excluding tests) is ~4,000 lines of Python.

Is there a small subsection of the code in `django/db/models/` that is all that's necessary to use the ORM part? Or maybe I'm missing something about the "core" of the ORM?

boxed · 2 years ago
It supports several backends and lots of stuff including migrations. It's thin for what it does.

> For comparison, all of Flask

That's... not a reasonable comparison. Flask does basically nothing. Of course it's small. And it does something totally different so why compare?

sanderjd · 2 years ago
Ha, should have read the comments before I wrote mine. Yep, it's this composability aspect that never seems to have occurred to the authors of this kind of think-piece.

I used to be pretty anti-ORM myself because I loathed the complexity of ActiveRecord (in the Rails world), but then I discovered arel, the nice composable relational query builder underneath, and saw the light. A composable layer of abstraction over SQL is critical in an application. (I still prefer raw SQL for analytical queries.)

pjmlp · 2 years ago
Just like stored procedures compose, like in any procedural programming language.
zknill · 2 years ago
Seems like there's 3 groups of opinions on ORMs:

Firstly (1); "I want to use the ORM for everything (table definitions, indexes, and queries)"

Then second (2), on the other extreme: "I don't want an ORM, I want to do everything myself, all the SQL and reading the data into objects".

Then thirdly (3) the middle ground: "I want the ORM to do the boring reading/writing data between the database and the code's objects".

The problem with ORMs is that they are often designed to do number 1, and are used to do number 3. This means there's often 'magic' in the ORM, when really all someone wanted to do was generate the code to read/write data from the database. In my experience this pushes engineers to adopt number 2.

I'm a big fan of projects like sqlc[1] which will take SQL that you write, and generate the code for reading/writing that data/objects into and out of the database. It gives you number 3 without any of the magic from number 1.

[1] https://sqlc.dev/

tantaman · 2 years ago
I'm in a 4th camp: we should be writing our applications against a relational data model and _not_ marshaling query results into and out of Objects at all.

Elaborations on this approach:

- https://news.ycombinator.com/item?id=34948816

- https://github.com/papers-we-love/papers-we-love/blob/main/d...

- https://riffle.systems/essays/prelude/

specialist · 2 years ago
Nice share, thanks.

I was never more productive than when using Access (and dBase II before that). Why can't we have that?

My theory: Something was lost in the jump from workgroup computing to client/server.

Imagine if Access was rebuilt on top of a client/server stack. That's kind of what Riffle is trying to do.

I've been (slowly) working on the persistence stuff. It (mostly) moots the SQL vs ORM vs query builder slap-fight.

I've got some notions (and POCs) about UI, mediated via HTTP & HTML.

I'd love to have some CRDT-like smarts; learning more is on my TODO list.

I'm still thinking about the "reactive" part. I haven't imagined anything past Access VBA style programming. I'm struggling to envision a FRP-meets-CRUD future perfect world.

pjmlp · 2 years ago
Still on the 4th camp?

Using stored procedures and triggers as much as possible.

tracker1 · 2 years ago
I'm pretty firmly in #2... it's relatively straight forward in a scripting language, and easy enough with something like C# with Dapper. In the end ORMs tend to over-consume, and often poorly. And even when they don't in most cases, they start to in more difficult cases. That doesn't even get into the amount of boilerplate for ORMs. You have to buy in to far more than their query model(s).
karmakaze · 2 years ago
Maybe those are the main/popular groupings. Where I fall is that I want typesafe constructions of queries that match the current schema. The query compositions should follow the SQL-style structure so there's no 'shape-mismatch' composing the query using the library. Some may not consider this to be an ORM (though it does map relations to objects).
waffletower · 2 years ago
There is definitely a fourth category -- "I want to build database queries natively using the paradigms of the language I am developing with, without use of SQL or an intermediary which translates into SQL."
specialist · 2 years ago
> ...this pushes engineers to adopt number 2

Yup. Use of HQL/JPQL is proof that you shouldn't be using an ORM. Just use SQL instead. And once you have some SQL, it's just easier (overall) to do all SQL.

sanderjd · 2 years ago
Well put!
NewEntryHN · 2 years ago
Any serious application beyond the example given in this article will include conditional SQL constructs which go beyond SQL query parameters and will therefore require string formatting to build the SQL.

Think a simple UI switch to sort some result either ascending or descending, which will require you format either an `ASC` or a `DESC` in your SQL string.

The moment you build SQL with string formatting is the moment you're rewriting the SQL formatter from an ORM, meeting plenty of opportunities to shoot yourself in the foot.

nicoburns · 2 years ago
> The moment you build SQL with string formatting is the moment you're rewriting the SQL formatter from an ORM, meeting plenty of opportunities to shoot yourself in the foot.

I used to think this, but at my last company we ended up rewriting all these queries to use conditional string formatting as we found it much more readable. The key was having named parameter binding for that string, so you didn't have to worry about matching up position arguments. That along with JavaScripts template string interpolation actually made the string-formatted version pretty nice to work with.

boxed · 2 years ago
Sounds like just begging for SQL injection attacks.
williamdclt · 2 years ago
There’s a world between a query builder and an ORM. The point of ORMs isn’t to build queries, if that’s the only need might as well just use a query builder which is a lot more lightweight and doesn’t come with all the downsides of orms
masklinn · 2 years ago
The OP literally says to ignore query builders, not just ORMs. When they state “just write SQL” that’s their actual thesis.
coldtea · 2 years ago
What you describe just needs a query builder (e.g. in Java something like jOOQ), not necessarily an ORM.
masklinn · 2 years ago
OK but TFA is not just against ORMs, it’s also against query builders. That’s what GP is replying to.
tracker1 · 2 years ago
Depends on the abstraction... for example .Net's extensions for LINQ are pretty good at this, I haven't generally used the LINQ syntax, but the abstraction for query constructs are pretty good, combined with Entity Framework. Of course, there's a lot that I don't care for and would prefer Dapper. In the end, the general environment of .Net dev being excessively "enterprisey" has kept me at bay the past several years.
dqv · 2 years ago
Just looked at LINQ and it looks like Ecto [0] used a lot of its ideas for inspiration! I haven't used LINQ, but in Ecto, there are so many useful constructs for composing queries. If you get a stinker of a query, you have multiple escape hatches such as fragments [1] or just writing the queries directly as needed [2].

For beginners, the Elixir language constructs can be a little clunky, but once you get it, it's so productive and I miss that productivity when doing more advanced queries in other languages.

[0]: https://hexdocs.pm/ecto/Ecto.Query.html [1]: https://hexdocs.pm/ecto/Ecto.Query.html#module-fragments [2]: https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.html#query/4

drdaeman · 2 years ago
IMHO the real proper solution is to have an SQL parser, so you can have your SQL represented as an AST, do some operations on it, then compile it back to a query.

Sadly, I'm not aware of any good solutions to this. SQLAlchemy Core can build an operates on an AST, but it doesn't parse raw SQL into a query (so one has to write their queries in Python, not SQL). Some parser libraries I've seen were able to parse the query but didn't have much in terms of manipulations and compiling it back.

Rudism · 2 years ago
The next logical step after writing the code given in the article is to abstract common boilerplate SQL into a library so you're not spending 50% of your time writing and re-writing basic SQL insert, update, and select statements every time your models need to be updated. At which point all you've done is write your own ORM.

If you want to go full-blown SQL you can use something like PostGraphile, which allows you to create all of your business entities as tables or views and then write all your business logic as SQL stored procedures, which get automatically translated into a GraphQL API for your clients to consume, but once you move beyond basic CRUD operations and reporting it becomes incredibly difficult to work with since there aren't really any good IDEs that help you manage and navigate huge pure-SQL code bases.

If you're really dead set against using a powerful ORM, it's probably still a good idea to find and use a lightweight one--something that handles the tedious CRUD operations between your tables and objects, but lets you break out and write your own raw queries when you need to do something more complex. I think there's a sweet spot between writing every line of SQL your application executes and having an ORM take care of boilerplate for you that will probably be different in every case but will never be 100% at one end or the other.

hot_gril · 2 years ago
The SQL inserts/updates have never felt tedious for me even in large projects, partially owed to careful use of jsonb for big objects where it makes sense (e.g. user settings dicts). Other than that, keeping a tight schema design.
hot_gril · 2 years ago
Of course, jsonb didn't exist until 2014ish. IMO this was a serious gap in SQL before, and it likely spawned the concepts of NoSQL and ORMs to begin with, which may have been the inspiration for jsonb. Hurray for competition.
bbojan · 2 years ago
The article is missing the code for creating the "users" database table. What about indexes? Migrations? Relations to other tables?

I mean you can just write SQL instead of using the ORM if your project consists of a single table with no indexes that will never change, sure.

joaodlf · 2 years ago
When it comes to migrations, I've been fine with https://github.com/golang-migrate/migrate

There are a multitude of extra things to consider, but none of those things are, in my opinion, imperative to having success with SQL in Python. Will it be hard to achieve the same level of convenience that modern ORMs provide? Absolutely. But there is always a cost.

I firmly believe that for most projects (especially in the age of "services"), an approach like this is very much good enough. Also, a great way to onboard new developers and present both SQL and simple abstractions that can be applied to many other areas of building software.

tracker1 · 2 years ago
Agreed, I've seen plenty of what wind up being very byzantine and complex migration strategies over the years, and in the end simple SQL scripts tends to work the best. I will note, that it's sometimes easier to do a DB dump for the likes of sprocs, functions, etc, if you want the "current" database bits to search through.

Deleted Comment

m000 · 2 years ago
And good luck with writing tests for your sql code.
waffletower · 2 years ago
I really dislike SQL, but recognize its importance for many organizations. I also understand that SQL is definitely testable, particularly if managed by environments such as DBT (https://github.com/dbt-labs/dbt-core). Those who arrived here with preference to python will note that dbt is largely implemented in python, adds Jinja macros and iterative forms to SQL, and adds code testing capabilities. No ORM required whatsoever.
jeltz · 2 years ago
Why would the be an issue? I have written plenty of tests for SQL code an it is no harder than writing tests for e.g. Ruby or Python code. Especially if you have an ORM involved.
somat · 2 years ago
sometimes the idea is that the database lives it's own life outside the application. Probably not the case here, but under that viewpoint the application is just one of perhaps many that access the data and as such creating tables, indexes, migrations and relations are none of it's business.
m000 · 2 years ago
But it is the application's business. You may not be altering the database schema from your application, but you still need to make sure that its code is in-sync with it.

This means that you will need extra tooling, and if you're DIYing you will need to write it yourself.

promiseofbeans · 2 years ago
> ... Python dot not have anything in the standard library that supports database interaction, this has always been a problem for the community to solve.

Python has built-in support for SQLite in the standard library: https://docs.python.org/3/library/sqlite3.html

uranusjr · 2 years ago
Python also has the DBAPI specification, which defines what interface a library must support to be considered a database driver. The author claiming Go’s sql package encourages writing SQL directly while Python doesn’t really seems a bit awkward.
dikei · 2 years ago
Also, DB-API 2.0 is a standard that's followed by most database drivers, similar to what JDBC is for Java, though not as strictly enforced.
reportgunner · 2 years ago
sqlite-utils[0] is a great library for working with SQLite

[0] https://sqlite-utils.datasette.io/en/stable/