PRQL: Pipelined Relational Query Language

For me the examples on the website https://prql-lang.org/ are the biggest selling point for PRQL, in particular the SQL it generates. It looks clean, straightforward, something I would've written myself.

In general, I like this slightly more careful take on modern database development. 10-15 years people would start a brand new database like Mongo, or Riak, or Influx, or whatever, and would try to convince application developers to select it for new projects. But recently we started getting more conservative options like EdgeDB, TimescaleDB, or even PRQL which all expect us to run Postgres with some addons and / or query preprocessors. Tech like this is so much easier to adopt!

I'm really liking what Edge folks are doing with schemas and migrations, but I do find PRQL syntax much more intuitive. My application code is littered with data transformation pipelines already: all these map / filter chains in TYpeScript, iterators in Rust, enumerables in Ruby, streams in Java, LINQ in .net, Rx in dozens of languages etc. etc. So the concept is very, very familiar. Getting nice SQL queries out of PRQL that I can store, inspect later, see the query plans, add indices where necessary, is just great. It's such a good tool!

OJFord · 2 years ago

To me it seems quite nice, but really just trivially different from SQL - like if Ruby was 'friendlier syntax that transpiles to Python', meh? You'd use whichever you happened to learn first and not bother with the other. (That's often true even though it's more than that of course.)

The examples arbitrarily make SQL look more verbose:

    SELECT
      id,
      first_name,
      age
    FROM
      employees
    ORDER BY
      age
    LIMIT
      10

Yes! Of course I'd rather:

    from employees
    select {id, first_name, age}
    sort age
    take 10

..but wait, actually the SQL could've been:

    select id, first_name, age
    from employees
    order by age
    limit 10

and it's more verbose by a character or two... (no braces, but 'order by' vs 'sort')

viraptor · 2 years ago

It's not the tiny changes in the syntax/order that matter here. It's that if you want to limit the columns or do something else with them afterwards, in PRQL you append that text to the end. In SQL you'd have to wrap it around your original query, it inject joins in the middle. It doesn't read well anymore, because "how" you're doing things eclipses "what" you're doing.

For a simple select, this just doesn't matter. But you also wouldn't bother with PRQL for simple selects.

andrewl-hn · 2 years ago

I use CTEs, window functions, and groupings all the time when I write reporting queries. These things tend to be much more verbose in raw SQL, and ORMs / Query Builders either do not support some of these features at all or do very poor job (like, force me to put raw SQL substrings in my code), or force to write DSLs that are even more verbose than raw SQL. Look at corresponding PRQL samples, and you may see an appeal.

Also, I agree, no one should write SQL like this - screaming keywords, superficial newlines, etc. I don't think this style made sense ever, even in 1970s.

hn_throwaway_99 · 2 years ago

I agree that I think the difficulty with the uptake will be "it's really just less annoying SQL", and it's hard to overtake a technology that's so ubiquitous when your fixes are really "nice to haves" vs something truly transformational.

That said, it's not the succinctness that's an improvement, it's that the pipelined nature of PRQL really maps much better to how people should think about queries, and also how the server executes them! Something as trivially simple as putting the from clause first means I can now get much better typeahead/autocomplete support from dev tools. Heck, I already do this now in a slightly more annoying manner: I write "select * from some_table ..." first, then I go and type out the actual columns I want because once the from clause is correct my IDE will autocomplete/show errors for possible columns.

ravi-delia · 2 years ago

That's a really simple example though. I think the real pitch for PRQL is that the syntax is way more regular- the group operator just runs a normal pipeline on each group and multiple derivations with different operations between them don't need to all be plotted out in advance, just as an example. SQL suffers because it's not really composable, especially once you get outside of specific modern versions.

jtsuken · 2 years ago

Being able to pick the source (i.e. the FROM clause) first is useful in itself, as you then get the benefit of sensible auto-complete suggestions.

jatins · 2 years ago

Even though a small example doesn't highlight the advantages well, I still prefer the PRQL syntax in this because that reads like map/filter code I'd write in some programming language.

So the mental model seems easier to me

krick · 2 years ago

"Order by" is a much better name, by the way. I'm permanently slightly annoyed by the fact that in programming it's a custom to call ordering "sorting" for some unimaginable reason.

t8sr · 2 years ago

I don't get that - to me the examples are much less readable than SQL and I don't understand why I should want to use this. Like, yes, you can reorder the query sections, which seems to be everyone's complaint about SQL, but then you also have multiple types of brackets, colons and other syntax for no reason, all while not really accomplishing anything SQL doesn't already do.

What's the attraction?

snthpy · 2 years ago

If you're happy with SQL then there isn't much point.

For the folks building and supporting PRQL, SQL just has a few too many warts and the popularity of tools like Pandas, dplyr, Polars, LINQ, ... shows that for analytical work we often like to work with our data differently. Other frameworks and languages feel that we should throw out Relational Algebra as well but we feel that's like throwing the baby out with the bathwater.

PRQL's core tenets are that Relational Algebra is foundational and a great abstraction but what's needed is a modern, ergonomic language that allows us to work with data the way most of us conceive of it - as pipelines of transformations.

Personally I feel much more productive in PRQL. I can start with a dataset and append one transformation at a time, writing from top to bottom as I go along without having to jump back and forth between the SELECT and the WHERE and the GROUP BY clause etc... .

Also, if I want to take out a step, I can just comment out that line and the rest of the pipeline usually still works. This might seem like a minor thing but in my experience it's those kind of ergonomics that make the difference in actual day to day work rather than fancy theoretical features you only use once in a blue moon. It's therefore worth noting that this was an intentional design decision. You try and take out some steps from your SQL query and see how well the rest of your query holds up.

hn_throwaway_99 · 2 years ago

The attraction is that, especially for more complicated queries with complex joins, subqueries, aggregates, etc., that the structure of PRQL much more closely matches the "English", mental-model of the query. I can just read PRQL linearly, from top to bottom, and at each point I know what is being "pipelined" into the next phase of the query.

With SQL I have to look all over the place. And it's not just that FROM should come before SELECT, it's that if I'm doing, say, an aggregation with a group by, normal SQL doesn't really have anything to make me think about the ungrouped rows, and then merge them together to get the aggregate values. With PRQL I can just go top to bottom, and for me it's much easier to reason about (i.e. first get all the rows, then group by some specific columns, then take aggregates, etc.)

And I say this as someone who spends about half my days in SQL at present.

0cf8612b2e1e · 2 years ago

The attraction is something that was designed after decades of usage and PL research. Consistency of syntax is a big one for me. A favorite example of mine:

  SELECT substring('PostgreSQL' from 8 for 3);
  SELECT substring('PostgreSQL', 8, 3); -- PostgreSQL-only syntax
  SELECT trim(both from 'yxSQLxx', 'xyz');
  SELECT extract(day from timestamp '2001-02-16 20:38:40');

Taken from: https://www.edgedb.com/blog/we-can-do-better-than-sql

Maybe if SQL would give me that monumental ask of trailing commas, perhaps I would hate it less.

slt2021 · 2 years ago

I think main difference is how PRQL translates into query execution plan, with SQL you need to read entirety of query to get rough understanding of order of operations (especially if you join two-three tables and have some nested and lateral queries).

with PRQL I see that the order will be explicit and set up by developer, so any footguns will be evident.

things like predicate push down, optimization fence, variable rewrite, etc are not needed to be relied upon, because pipeline of query is more explicit in PRQL

also since it is new lang, it can be naturally extended into ETL type data pipelines

also because PRQL can be translated into query execution plan - it can be converted into non-SQL languages like MongoDB or pandas / spark / etc, eliminating SQL altogether for distributed nosql engines

dlisboa · 2 years ago

Maybe you didn't see the more complex examples, but some of the SQL queries it abstracts are very unintuitive to write by hand. PRQL turns it into something more readable and closer to the expected mental model.

For people who can write error-free and complex portable SQL queries by heart this may not be really interesting. From experience, however, that is not a skill most developers have.

the_mitsuhiko · 2 years ago

I don't use PRQL but I absolutely get the appeal but specifically on the readability part, some things that are easy in PRQL are just awful in SQL.

From the website for instance this is a nightmare to do in SQL:

    from employees group role (sort join_date take 1)

ravi-delia · 2 years ago

I mean if you can look at the "expressions" example on the homepage and say the SQL is more readable than PRQL, then more power to you. Hell, actually more power to you, that's extremely impressive. But I think for many people, especially programmers more used to parsing brackets and nesting than pseudo-english, PRQL is easier in the places where you're trying to compose.

andrewl-hn · 2 years ago

Oh, I'm a big fan of raw SQL, too. There's very little out there that can bring me same joy as a well-written 30-50 lines-long sql query! Like, I can spend an hour or two on it, but when it runs and produces the results in the desired form, it's sooo rewarding! And realizing that producing similar results in my main language (be it Rust or TypeScript, or whatever) would take me 3-4 days instead (and it would run much longer) makes me appreciate SQL programming more and more. In terms of "usefulness per minute spend" nothing comes close to SQL!

However, the syntax can be a bit clunky from time to time. I'm very fortunate that I use Postgres only in the past 5-7 years, so CTEs, and things like Timescale or JSONB, or PlV8 / PlRust for stored procedures are often on the menu. Yet, simple things still require repeating fragments of code from time to time, and complex grouping or window functions code often looks too convoluted.

And as I wrote in my GP comment: I (and many other developers) already use pipelines for data manipulation in code - beyond databases. Say, we got data from an API endpoint, or we mixing data from several data sources. SQL is familiar because I see it often, PRQL is familiar because I use similar pipelining concepts often in my no-SQL code. Would I use PRQL for super simple queries? Probably not. Would I be upset if one of my coworkers used it for simple queries? No, why would I? Would I try implementing complex reporting query using PRQL? Yeah, I would. Partially because I suspect I would get to my data in desired form quicker with pipelines, and partially because I know: if I get stuck I would convert my intermediary query to SQL and pick up from there.

For me PRQL looks better then most DSLs for ORMs or Query Builders in most languages. Adding a new tool to a project may be annoying, but depending on a project I'd rather pay this price once if it makes my less SQL-savvy team mates more comfortable. "Yet another syntax" is obviously the main point against, but from the application developer perspective ORMs / Query Builders often force one to learn "yet another syntax" anyway. "Don't use ORM / Builder" is an often voiced opinion and yet in practice we work with them all the time, too.

So, I view PRQL not as an alternative to SQL but as a third database communication dialect in addition to SQL and my ORM / Query Builder DSLs.

spullara · 2 years ago

it seems like any SQL parser could let you put FROM first and solve a lot of the annoyances of SQL

Deleted Comment

If the main complaint people have about SQL is that you can't swap SELECT, FROM and WHERE, then that's pretty good for a language designed in the 70s.

This, by contrast, looks like it has a bunch of random line noise for syntax. Why on earth should I like this:

`join side:left p=positions (p.id==employees.employee_id)`

better than this:

`LEFT JOIN positions AS p ON p.id = employees.employee_id` ?

crote · 2 years ago

The main issue with SQL is that you are stuck in a very strict way of writing things, which does not clearly match to how I think. The top-down way of writing PRQL where each step is simply a transformation of the previous one makes way more sense to me. SQL is something I'd need a reference manual for, PRQL is simply writing down what I want the query to do.

I do agree that PRQL's join syntax is extremely bad, though. They should've stuck to explicit "left join"-like keywords, and the alias & join column shorthand could be done better.

appplication · 2 years ago

Despite it not at all being the right tool for most jobs, this is why I actually enjoy writing spark. It reads top down, and is “sql-ish” for most commands. Enough that you don’t need to go out of your way to learn it if you already know SQL, you just think in terms of serial transformations.

jdmichal · 2 years ago

I've had two real gripes with SQL. The rest of it has been, as you said, pretty good.

Complaint 1: Not being able to use selected columns later in the same select.

    SELECT
        gnarly_calculation AS some_value,
        some_value * 2 AS some_value_doubled

Instead:

    SELECT
        subquery.*,
        some_value * 2 AS some_value_doubled
    FROM (
        gnarly_calculation AS some_value
    ) AS subquery

Complaint 2: Not being able to specify all columns except. This combines with the above, where I have to pull some intermediate calculations forward from a subquery, but I don't need them in the final output. So I have to then enumerate all the output columns that I actually want, instead of being able to say something like `* EXCEPT some_value`.

zX41ZdbW · 2 years ago

Both complaints are resolved by ClickHouse.

This video also covers many other advantages of ClickHouse's SQL dialect: https://www.youtube.com/watch?v=zhrOYQpgvkk

Some may find your first complaint questionable... But I specifically designed ClickHouse SQL to allow aliases to be used and referenced in every part of SQL query.

AtNightWeCode · 2 years ago

Don't use *. It is a common source for all kinds of problems. Many query langs does not support it at all for that reason.

snthpy · 2 years ago

PRQL fixes this.

(Disclaimer: I'm a PRQL contributor.)

lofatdairy · 2 years ago

Maybe it's my familiarity with R and the tidyverse paradigm, but I think in general this paradigm and syntax is pretty readable. The example you chose is perhaps a case where brackets and commas go a long way, but that aside it's really not too bad in context of the rest of the language. `:` seems to consistently be treated as keyword arguments, and `=` for aliasing.

kbenson · 2 years ago

Because now the type of join is an argument on the "join" operation, and join is the first word of that statement making it more obvious what the operation actually is? I also prefer foo(bar) over "bar to foo()" as well, which seems like the equivalent in a more general purpose function call in a language example.

xchkr1337 · 2 years ago

The problem with SQL is the same as the problem with C-style variable declarations. It sounds slightly better than the alternatives when you say the code out loud, but in reality it causes problems with readability and parsing/processing the code