Does OLAP Need an ORM

barrkel · 7 months ago

If you're doing OLAP, you probably want dimensions, measures and operators that operate on time aggregations and shifts. You want rollups and drill downs along multiple axes, with subtotals and probably pivots.

SQL isn't wholly adequate for this, it's hard work to get the SQL right and if there's joins involved it's not hard to accidentally fan out and start double counting.

If you ask me, you want an analytic model of the data that is designed around measures, dimensions, with an anointed time dimension, and a way of expressing higher level queries such that it automatically aggregates depending on which dimensions you leave out, and gives you options to sort, pivot, filter etc. dynamically.

This doesn't look like entities, really, but it is a model between you and the SQL.

From my scan - not detailed - reading of the article, Moose looks too low level and not a useful abstraction to sit in the same logical place that ORMs do in OLTP databases.

timgdelisle · 7 months ago

Very much agree with you, at this point the abstraction is too low-level to be considered a proper ORM (or whatever the acronym should be for OLAP) and we're progressively working our way up to the right level. I love the idea of operating at the dimensions/measures level. Hoping we address this concern in the next couple of releases! Really appreciate the feedback

paulbar · 7 months ago

I fully agree. This is why I created a SQL query engine for my own need few years ago, it is not open-sourced. It is called SquashQL (https://github.com/squashql/squashql). It provides a "simple" API (available in Typescript and Java for now) to perform "complex" and "dynamic" SQL queries such as pivot queries, standard queries with rollups or partial rollups with the ability to hide totals and subtotals, time-series and hierarchical comparison, bucketing, drilling across and more...

bob1029 · 7 months ago

> If you’ve got your OLAP schemas as objects in your application code

I guess I have a wildly different interpretation of typical OLAP scenarios. To me this acronym mostly means "reporting". And in 99% of cases where the business desires a new report, the ideal views or type systems have not been anticipated. In these cases (most of them), I can't imagine a faster way to give the business an answer than just writing some sql.

ElatedOwl · 7 months ago

I agree with that being fastest, but not cheapest.

In my experience these one off reports are very brittle. The app ends up making schema changes that are breaking to these one off reports, and you usually don’t find out until it goes to production.

I’ve dealt with the maintenance nightmare before. At current gig we’re exploring solutions, curious what a robust pipeline looks like in 2025.

The ORM piece is interesting — we use ActiveRecord and Ruby, and accidentally breaking schema changes within app will get caught by the unit test suite. I would love for a way to bring OLAP reports in similarly to test at CI time.

lpapez · 7 months ago

Why not test the OLAP reports?

Surely there is a way to run a raw query in Rails/ActiveRecord and use it in a smoke test?

wredcoll · 7 months ago

I mean, if you're relying on tests to catch schema changes... then test your sql reports? This doesn't seem like an amzingly cool solution but if that's the one you're already using...

timgdelisle · 7 months ago

I'm one of the Moose maintainers, and yes, most OLAP use cases fall into data warehousing categories where exposing the database to analysts and letting them run loose with SQL is viable. We're seeing more and more that OLAP is becoming a core part of the application stack, for user and agent-facing analytics. There, we see a lot more appetite for building on the analytical stack the way we build on the transactional one.

sdairs · 7 months ago

Without doubt, the majority of the market for OLAP today is still internal warehousing & BI. But the market for using OLAP behind features inside user-facing B2C/B2B apps has been kicking off for quite a few years now. Big consumer apps like Stripe, Uber, Shopify...pretty much every B2B SaaS with a usage/metrics dashboard...they're usually punting queries off to an OLAP to populate those stats/charts. That's where something like this might come in handy, I can't imagine it being using for general internal reporting (in the current form, anyway.)

michaelmarkell · 7 months ago

The way my company uses Clickhouse is basically that we have one giant flat table, and have written our own abstraction layer on top of it based around "entities" which are functions of data in the underlying table, potentially adding in some window functions or joins. Pretty much every query we write with Clickhouse tacks on a big "Group By All" at the end of it, because we are always trying to squash down the number of rows and aggregate as aggressively as possible.

I imagine we're not alone in this type of abstraction layer, and some type-safety would be very welcome there. I tried to build our system on top of Kysely (https://kysely.dev/) but the Clickhouse extension was not far along enough to make sense for our use-case. As such, we basically had to build our own parser that compiles down to sql, but there are many type-error edge cases, especially when we're joining in against data from S3 that could be CSV, Parquet, etc.

Side note: One of the things I love most about Clickhouse is how easy it is to combine data from multiple sources other than just the source database at query time. I imagine this makes the problem of building an ORM much harder as well, since you could need to build type-checking / ORM against sql queries to external databases, rather than to the source table itself

cies · 7 months ago

My take is...

No one needs an ORM: https://dev.to/cies/the-case-against-orms-5bh4

The article opens with "ORMs have proven to be useful for many developers" -- I believe the opposite is true.

lucisferre · 7 months ago

I agree with this as well. I started my career at the height of ORMs. Most software developers were only learning the ORM APIs (which of course all differed significantly) and very few were learning SQL outside of the bare basics.

ORMs, like all abstractions, are a leaky abstraction. But I would argue because of the ubiquity and utility of SQL itself they are a very leaky one where eventually you are going to need to work around them.

After switching to just using SQL in all situations I found my life got a lot simpler. Performance also improved as most ORMs (Rails in particular) are not very well implemented from a performance standpoint, even for very simple use cases.

I can not recommend enough that people skip the ORM entirely.

cies · 7 months ago

High five!

sdairs · 7 months ago

Is the proposition in the OP not pretty much what you're suggesting in your blog? They're currently not using the query builder syntax, instead its pretty much "improving on SQL as strings" with a bunch of the other ORM-like benefits (type safety, autocomplete, etc.)

Perhaps saying "ORM" is a bit of a misnomer, but they're discussing the DX ergonomics of an ORM and acknowledging the exact challenges you describe

primering · 7 months ago

Yeah it's funny they even mention ORM while at the same offering something that has nothing to do with ORMs at all. Yes, many ORM libraries offer additional tools like migration and querybuiler, but that's not the point of an ORM. ORM maps relation data to your OOP data structures. They completely misused the term entirely, which is kinda surprising.

odie5533 · 7 months ago

How do you convert your type-safe native objects to and from the database in a reusable way? If you do anything in a reusable way, you're 95% of the way to an ORM. Or do you just accept that you get back random dictionaries from the database and don't care about type-safety?

wswope · 7 months ago

You write INSERT and SELECT statements for the object types you want to persist.

What is your concern re: random types popping up? SQLite springs to mind as a prime offender due to not enforcing column types OOTB, but most dialects have rather strong typing.

If we’re talking about mapping UUIDs and datetimes from their DB representations to types defined by the language stdlib, that’s usually the responsibility of the DB driver, no?

cyberax · 7 months ago

This really depends on the quality of the ORM. I used to write Java software, and Hibernate with QueryDSL saved me probably _months_ of typing. And I dare say, produced much nicer-looking code.

And for most of the code, the performance and overheads were negligible. C# with LINQ is even better, it provides strong typecheck for the queries and often has almost zero overhead.

I'm using Go now, and I don't even want to touch any of the available ORMs because they all suck, compared to the state-of-the-art in Java circa 20 years ago.

wredcoll · 7 months ago

I think what I just really want is a language that treats sql as a "first class" component in the same way perl treats regexes.

The devil is of course in the details, but it's a nice dream.

weinzierl · 7 months ago

Not quite first class citizen, but you might like sqlx. At least it embraces the idea that writing SQL directly is in fact a good idea and helps you to do so safely.

https://docs.rs/sqlx/latest/sqlx/

caspper69 · 7 months ago

LINQ? Just throwing it out there; obviously not everybody can or wants to run a C#/.NET stack, but entity framework (core) is about as close as you can get to the perl and regex integration. I think Ruby on Rails gets there too, but I'm not a RoR guy, so I can't comment.

SoftTalker · 7 months ago

> a language that treats sql as a "first class" component

pl/pgSQL (or pl/sql in oracle) and variants.

saadatq · 7 months ago

> Borrow the best core concepts:

Schemas as application code means you get version control, PR review and type‑safe changes. A query builder that feels like SQL and lets you write “real” ClickHouse queries with IDE autocompletion and compile‑time checking. Local development and CI should mirror production so you can preview schema changes before they apply to prod.

>>>

I believe this is what dbt set out to accomplish. They came at the problem from the point of view of a data transformation language that is essentially a pseudo type checked SQL for analytical engines with some additional batteries included (ie macros) but the motivation was similar. I’ve always felt that what has held dbt back from more mainstream adoption by the dev community is because they’ve prioritized data transformation over data access to the application layer - ie business intelligence tools over a web app.

Moosestack looks interesting- will definitely check it out.

dcmatt · 7 months ago

Just expose a way to explore hierarchies and measures, give me an ability to generate "alternate hierarchies" then let me run MDX and leave me alone! ORM and OLAP don't belong in the same sentence together.

reactordev · 7 months ago

The reasoning behind yes, it would help is in building data tools for people. So you load up your parque files with data, ingest it into your platform, it uses clickhouse (or some OLAP) for tabulation of data, the platform presents a UI that allows the data engineer to select which fields, etc.

This can only be achieved by utilizing some sort of type system. Whether it's reflecting on the tables, codegen on the fly, or having to write custom adapters for each structure. All of which can be greatly simplified with an ORM.

It's not going to help much with bespoke report asks from the business though.

oatsandsugar · 7 months ago

Yeah, this is mainly aimed at applications that need an OLAP backend (think user facing analytics, or a database that backs chat applications)

RobinL · 7 months ago

When I read the title my brain immediately jumped to a slightly different idea. With olap, I often find it annoying to figure out the joins from the fk/pk relationships, so I was imagining a tool that kind of automatically followed the links for you. A bit like how a orm gives you auto complete, but without the user having to manually enter the schema.

And I wanted it to emit the raw SQL because that's generally what I want for olap.

So I had to go at building it. If anyone's interested a very rough demo/prototype is here: https://www.robinlinacre.com/vite_live_pg_orm/

Load in the demo Northwind schema and click some tables/columns to see the generated joins

reactordev · 7 months ago

I like how it joins the inner tables to get to your desired table. Going from employees to products is a walk.