Readit News logoReadit News
andrewl-hn · 2 years ago
For me the examples on the website https://prql-lang.org/ are the biggest selling point for PRQL, in particular the SQL it generates. It looks clean, straightforward, something I would've written myself.

In general, I like this slightly more careful take on modern database development. 10-15 years people would start a brand new database like Mongo, or Riak, or Influx, or whatever, and would try to convince application developers to select it for new projects. But recently we started getting more conservative options like EdgeDB, TimescaleDB, or even PRQL which all expect us to run Postgres with some addons and / or query preprocessors. Tech like this is so much easier to adopt!

I'm really liking what Edge folks are doing with schemas and migrations, but I do find PRQL syntax much more intuitive. My application code is littered with data transformation pipelines already: all these map / filter chains in TYpeScript, iterators in Rust, enumerables in Ruby, streams in Java, LINQ in .net, Rx in dozens of languages etc. etc. So the concept is very, very familiar. Getting nice SQL queries out of PRQL that I can store, inspect later, see the query plans, add indices where necessary, is just great. It's such a good tool!

OJFord · 2 years ago
To me it seems quite nice, but really just trivially different from SQL - like if Ruby was 'friendlier syntax that transpiles to Python', meh? You'd use whichever you happened to learn first and not bother with the other. (That's often true even though it's more than that of course.)

The examples arbitrarily make SQL look more verbose:

    SELECT
      id,
      first_name,
      age
    FROM
      employees
    ORDER BY
      age
    LIMIT
      10
Yes! Of course I'd rather:

    from employees
    select {id, first_name, age}
    sort age
    take 10
..but wait, actually the SQL could've been:

    select id, first_name, age
    from employees
    order by age
    limit 10
and it's more verbose by a character or two... (no braces, but 'order by' vs 'sort')

viraptor · 2 years ago
It's not the tiny changes in the syntax/order that matter here. It's that if you want to limit the columns or do something else with them afterwards, in PRQL you append that text to the end. In SQL you'd have to wrap it around your original query, it inject joins in the middle. It doesn't read well anymore, because "how" you're doing things eclipses "what" you're doing.

For a simple select, this just doesn't matter. But you also wouldn't bother with PRQL for simple selects.

andrewl-hn · 2 years ago
I use CTEs, window functions, and groupings all the time when I write reporting queries. These things tend to be much more verbose in raw SQL, and ORMs / Query Builders either do not support some of these features at all or do very poor job (like, force me to put raw SQL substrings in my code), or force to write DSLs that are even more verbose than raw SQL. Look at corresponding PRQL samples, and you may see an appeal.

Also, I agree, no one should write SQL like this - screaming keywords, superficial newlines, etc. I don't think this style made sense ever, even in 1970s.

hn_throwaway_99 · 2 years ago
I agree that I think the difficulty with the uptake will be "it's really just less annoying SQL", and it's hard to overtake a technology that's so ubiquitous when your fixes are really "nice to haves" vs something truly transformational.

That said, it's not the succinctness that's an improvement, it's that the pipelined nature of PRQL really maps much better to how people should think about queries, and also how the server executes them! Something as trivially simple as putting the from clause first means I can now get much better typeahead/autocomplete support from dev tools. Heck, I already do this now in a slightly more annoying manner: I write "select * from some_table ..." first, then I go and type out the actual columns I want because once the from clause is correct my IDE will autocomplete/show errors for possible columns.

ravi-delia · 2 years ago
That's a really simple example though. I think the real pitch for PRQL is that the syntax is way more regular- the group operator just runs a normal pipeline on each group and multiple derivations with different operations between them don't need to all be plotted out in advance, just as an example. SQL suffers because it's not really composable, especially once you get outside of specific modern versions.
jtsuken · 2 years ago
Being able to pick the source (i.e. the FROM clause) first is useful in itself, as you then get the benefit of sensible auto-complete suggestions.
jatins · 2 years ago
Even though a small example doesn't highlight the advantages well, I still prefer the PRQL syntax in this because that reads like map/filter code I'd write in some programming language.

So the mental model seems easier to me

krick · 2 years ago
"Order by" is a much better name, by the way. I'm permanently slightly annoyed by the fact that in programming it's a custom to call ordering "sorting" for some unimaginable reason.
t8sr · 2 years ago
I don't get that - to me the examples are much less readable than SQL and I don't understand why I should want to use this. Like, yes, you can reorder the query sections, which seems to be everyone's complaint about SQL, but then you also have multiple types of brackets, colons and other syntax for no reason, all while not really accomplishing anything SQL doesn't already do.

What's the attraction?

snthpy · 2 years ago
If you're happy with SQL then there isn't much point.

For the folks building and supporting PRQL, SQL just has a few too many warts and the popularity of tools like Pandas, dplyr, Polars, LINQ, ... shows that for analytical work we often like to work with our data differently. Other frameworks and languages feel that we should throw out Relational Algebra as well but we feel that's like throwing the baby out with the bathwater.

PRQL's core tenets are that Relational Algebra is foundational and a great abstraction but what's needed is a modern, ergonomic language that allows us to work with data the way most of us conceive of it - as pipelines of transformations.

Personally I feel much more productive in PRQL. I can start with a dataset and append one transformation at a time, writing from top to bottom as I go along without having to jump back and forth between the SELECT and the WHERE and the GROUP BY clause etc... .

Also, if I want to take out a step, I can just comment out that line and the rest of the pipeline usually still works. This might seem like a minor thing but in my experience it's those kind of ergonomics that make the difference in actual day to day work rather than fancy theoretical features you only use once in a blue moon. It's therefore worth noting that this was an intentional design decision. You try and take out some steps from your SQL query and see how well the rest of your query holds up.

hn_throwaway_99 · 2 years ago
The attraction is that, especially for more complicated queries with complex joins, subqueries, aggregates, etc., that the structure of PRQL much more closely matches the "English", mental-model of the query. I can just read PRQL linearly, from top to bottom, and at each point I know what is being "pipelined" into the next phase of the query.

With SQL I have to look all over the place. And it's not just that FROM should come before SELECT, it's that if I'm doing, say, an aggregation with a group by, normal SQL doesn't really have anything to make me think about the ungrouped rows, and then merge them together to get the aggregate values. With PRQL I can just go top to bottom, and for me it's much easier to reason about (i.e. first get all the rows, then group by some specific columns, then take aggregates, etc.)

And I say this as someone who spends about half my days in SQL at present.

0cf8612b2e1e · 2 years ago
The attraction is something that was designed after decades of usage and PL research. Consistency of syntax is a big one for me. A favorite example of mine:

  SELECT substring('PostgreSQL' from 8 for 3);
  SELECT substring('PostgreSQL', 8, 3); -- PostgreSQL-only syntax
  SELECT trim(both from 'yxSQLxx', 'xyz');
  SELECT extract(day from timestamp '2001-02-16 20:38:40');
Taken from: https://www.edgedb.com/blog/we-can-do-better-than-sql

Maybe if SQL would give me that monumental ask of trailing commas, perhaps I would hate it less.

slt2021 · 2 years ago
I think main difference is how PRQL translates into query execution plan, with SQL you need to read entirety of query to get rough understanding of order of operations (especially if you join two-three tables and have some nested and lateral queries).

with PRQL I see that the order will be explicit and set up by developer, so any footguns will be evident.

things like predicate push down, optimization fence, variable rewrite, etc are not needed to be relied upon, because pipeline of query is more explicit in PRQL

also since it is new lang, it can be naturally extended into ETL type data pipelines

also because PRQL can be translated into query execution plan - it can be converted into non-SQL languages like MongoDB or pandas / spark / etc, eliminating SQL altogether for distributed nosql engines

dlisboa · 2 years ago
Maybe you didn't see the more complex examples, but some of the SQL queries it abstracts are very unintuitive to write by hand. PRQL turns it into something more readable and closer to the expected mental model.

For people who can write error-free and complex portable SQL queries by heart this may not be really interesting. From experience, however, that is not a skill most developers have.

the_mitsuhiko · 2 years ago
I don't use PRQL but I absolutely get the appeal but specifically on the readability part, some things that are easy in PRQL are just awful in SQL.

From the website for instance this is a nightmare to do in SQL:

    from employees group role (sort join_date take 1)

ravi-delia · 2 years ago
I mean if you can look at the "expressions" example on the homepage and say the SQL is more readable than PRQL, then more power to you. Hell, actually more power to you, that's extremely impressive. But I think for many people, especially programmers more used to parsing brackets and nesting than pseudo-english, PRQL is easier in the places where you're trying to compose.
andrewl-hn · 2 years ago
Oh, I'm a big fan of raw SQL, too. There's very little out there that can bring me same joy as a well-written 30-50 lines-long sql query! Like, I can spend an hour or two on it, but when it runs and produces the results in the desired form, it's sooo rewarding! And realizing that producing similar results in my main language (be it Rust or TypeScript, or whatever) would take me 3-4 days instead (and it would run much longer) makes me appreciate SQL programming more and more. In terms of "usefulness per minute spend" nothing comes close to SQL!

However, the syntax can be a bit clunky from time to time. I'm very fortunate that I use Postgres only in the past 5-7 years, so CTEs, and things like Timescale or JSONB, or PlV8 / PlRust for stored procedures are often on the menu. Yet, simple things still require repeating fragments of code from time to time, and complex grouping or window functions code often looks too convoluted.

And as I wrote in my GP comment: I (and many other developers) already use pipelines for data manipulation in code - beyond databases. Say, we got data from an API endpoint, or we mixing data from several data sources. SQL is familiar because I see it often, PRQL is familiar because I use similar pipelining concepts often in my no-SQL code. Would I use PRQL for super simple queries? Probably not. Would I be upset if one of my coworkers used it for simple queries? No, why would I? Would I try implementing complex reporting query using PRQL? Yeah, I would. Partially because I suspect I would get to my data in desired form quicker with pipelines, and partially because I know: if I get stuck I would convert my intermediary query to SQL and pick up from there.

For me PRQL looks better then most DSLs for ORMs or Query Builders in most languages. Adding a new tool to a project may be annoying, but depending on a project I'd rather pay this price once if it makes my less SQL-savvy team mates more comfortable. "Yet another syntax" is obviously the main point against, but from the application developer perspective ORMs / Query Builders often force one to learn "yet another syntax" anyway. "Don't use ORM / Builder" is an often voiced opinion and yet in practice we work with them all the time, too.

So, I view PRQL not as an alternative to SQL but as a third database communication dialect in addition to SQL and my ORM / Query Builder DSLs.

spullara · 2 years ago
it seems like any SQL parser could let you put FROM first and solve a lot of the annoyances of SQL

Deleted Comment

zX41ZdbW · 2 years ago
We have recently merged PRQL support into ClickHouse: https://github.com/ClickHouse/ClickHouse/pull/50686

It's currently more like an experiment - I'm not sure if it will be usable or useful. There are some concerns about Rust, although minor: https://github.com/ClickHouse/ClickHouse/issues/52053#issuec...

snthpy · 2 years ago
This is awesome! Thank you!

Would that mean it would also be able to use it in clickhouse-local?

zX41ZdbW · 2 years ago
Yes:

    clickhouse-local --dialect prql
or:

    $ clickhouse-local

    :) SET dialect = 'prql'

0cf8612b2e1e · 2 years ago
I really want this to take off and become a first party supported language for Postgres. Yes, yes, relational algebra is beautiful and all that jazz. SQL is a garbage, first pass design that should have been replaced decades ago. Even Codd has complaints about it. It is amazing what he invented, but we have learned a lot about PL design since then.
aerzen · 2 years ago
People really like to associate relational algebra with SQL, probably because they learned them one alongside another. But SQL is really terrible relational language - it breaks a few core concepts in different places, like relations being unordered sets, that you can ORDER BY. This bubbles up as relations losing ordering after being wrapped into a subquery, which is really unexpected.

PRQL has a data model very similar to the relational one. The only big difference is that relations are ordered - they are defined as arrays of tuples. So let's hope that PRQL gets to be known as "the relational language Mk II"

cmrdporcupine · 2 years ago
Seems like you'd have to break quite a few relational model concepts in order to retain compatibility with SQL, no?

Ordered tuples like you say, but also duplicate tuples. And null values.

snthpy · 2 years ago
Not quite what you're asking for but DuckDB has both PRQL [1] and Postgres [2] extensions, so you could probably query your Postgres database with PRQL from there.

There's also a DBeaver plugin [3] which we still need to document better and simplify the usage of but you could potentially also use that to query Postgres with PRQL.

Finally there is pyprql [4] with which you could query Postgres from a Jupyter notebook.

[1]: https://github.com/ywelsch/duckdb-prql

[2]: https://duckdb.org/docs/extensions/postgres_scanner.html

[3]: https://github.com/PRQL/prql/issues/1643

[4]: https://github.com/PRQL/pyprql

(Disclaimer: I'm a PRQL contributor.)

stevage · 2 years ago
By first party I assume they mean using psql and postgres's own tooling.
vkazanov · 2 years ago
Relational algebra is still great, it is the sql that made an unbelievable mess out of this beautiful idea.
globular-toast · 2 years ago
Same. I first learnt SQL 20 years ago as a teenager and even back then I remember thinking how odd and, quite literally, backwards the language was. Back then I probably thought I just didn't understand enough to see why it had to be so. Now I know there is no reason. SQL has become more like a natural language. There's no arguing against it, you have to speak it even if it doesn't make sense. But it's not a natural language and we can do better.
ako · 2 years ago
It’s optimized for reading and use: the first time you see a query written by another developer, the interesting bit is what data it returns (the select part). Once you know that, you may be interested in the how, where does that data come from? Same for functions or methods, you first see the signature, input and output structures, before you see the implementation.
burcs · 2 years ago
Yeah, I have never liked that you choose what you are querying before you select the source. I think the formatting here is so much more intuitive.
t8sr · 2 years ago
If the main complaint people have about SQL is that you can't swap SELECT, FROM and WHERE, then that's pretty good for a language designed in the 70s.

This, by contrast, looks like it has a bunch of random line noise for syntax. Why on earth should I like this:

`join side:left p=positions (p.id==employees.employee_id)`

better than this:

`LEFT JOIN positions AS p ON p.id = employees.employee_id` ?

crote · 2 years ago
The main issue with SQL is that you are stuck in a very strict way of writing things, which does not clearly match to how I think. The top-down way of writing PRQL where each step is simply a transformation of the previous one makes way more sense to me. SQL is something I'd need a reference manual for, PRQL is simply writing down what I want the query to do.

I do agree that PRQL's join syntax is extremely bad, though. They should've stuck to explicit "left join"-like keywords, and the alias & join column shorthand could be done better.

appplication · 2 years ago
Despite it not at all being the right tool for most jobs, this is why I actually enjoy writing spark. It reads top down, and is “sql-ish” for most commands. Enough that you don’t need to go out of your way to learn it if you already know SQL, you just think in terms of serial transformations.
jdmichal · 2 years ago
I've had two real gripes with SQL. The rest of it has been, as you said, pretty good.

Complaint 1: Not being able to use selected columns later in the same select.

    SELECT
        gnarly_calculation AS some_value,
        some_value * 2 AS some_value_doubled
Instead:

    SELECT
        subquery.*,
        some_value * 2 AS some_value_doubled
    FROM (
        gnarly_calculation AS some_value
    ) AS subquery
Complaint 2: Not being able to specify all columns except. This combines with the above, where I have to pull some intermediate calculations forward from a subquery, but I don't need them in the final output. So I have to then enumerate all the output columns that I actually want, instead of being able to say something like `* EXCEPT some_value`.

zX41ZdbW · 2 years ago
Both complaints are resolved by ClickHouse.

This video also covers many other advantages of ClickHouse's SQL dialect: https://www.youtube.com/watch?v=zhrOYQpgvkk

Some may find your first complaint questionable... But I specifically designed ClickHouse SQL to allow aliases to be used and referenced in every part of SQL query.

AtNightWeCode · 2 years ago
Don't use *. It is a common source for all kinds of problems. Many query langs does not support it at all for that reason.
snthpy · 2 years ago
PRQL fixes this.

(Disclaimer: I'm a PRQL contributor.)

lofatdairy · 2 years ago
Maybe it's my familiarity with R and the tidyverse paradigm, but I think in general this paradigm and syntax is pretty readable. The example you chose is perhaps a case where brackets and commas go a long way, but that aside it's really not too bad in context of the rest of the language. `:` seems to consistently be treated as keyword arguments, and `=` for aliasing.
kbenson · 2 years ago
Because now the type of join is an argument on the "join" operation, and join is the first word of that statement making it more obvious what the operation actually is? I also prefer foo(bar) over "bar to foo()" as well, which seems like the equivalent in a more general purpose function call in a language example.
xchkr1337 · 2 years ago
The problem with SQL is the same as the problem with C-style variable declarations. It sounds slightly better than the alternatives when you say the code out loud, but in reality it causes problems with readability and parsing/processing the code
snthpy · 2 years ago
This is well timed as PRQL 0.9 was just released a few hours ago. Have a look at the release notes here:

https://github.com/PRQL/prql/releases/tag/0.9.0

There is a rather large breaking change in the syntax from `[]` to `{}` for tuples. This is because initially it seemed like these were lists but over time we've realised that they are actually tuples. As per the release notes, freeing up `[]` clears the way to start supporting arrays soon.

Disclaimer: I'm a PRQL contributor.

Taikonerd · 2 years ago
The limitation of PRQL is that it only does SELECTs, by design. If you want to insert/update/delete data, you're back to SQL.

That means that your team's data scientist might give you a query written in PRQL, but if you want to actually incorporate it into the data pipeline, you'll need to translate it into SQL.

I wish that PRQL would support at least a limited ability to insert -- for example, maybe just the case of inserting into a new temp table. No update or ON CONFLICT logic to worry about. It could look like this:

    from tracks
    filter artist == "Bob Marley"
    save bob_marley_tmp

10000truths · 2 years ago
I don't see why PRQL can't support data mutation. Just have an insert/update/delete operator that must go at the end of a pipeline, and which takes a table name as an argument. An SQL query like this:

  UPDATE counters SET value = value + 1 WHERE name LIKE 'prefix.%'
Could then be written in PRQL as something like this:

  from counters
  filter startswith(name, 'prefix.')
  derive {
    new_value = value + 1
  }
  select {
    name, new_value
  }
  update counters

BrentOzar · 2 years ago
This feels way less intuitive than SQL:

UPDATE counters SET value = value + 1 WHERE name LIKE 'prefix.%'

cpursley · 2 years ago
Yep, these is where I lost interest.
smartmic · 2 years ago
This must not be missing here then: "I don't want to learn your garbage query language" [1]

[1] https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-ga...

maximilianroos · 2 years ago
Big fan of this post! We link to it from PRQL website.

Our goal for PRQL is a great language to integrate with & build on. So we don't have N languages for N databases. Because PRQL will always be open-source, and won't ever have a commercial product, we think that's much more feasible than a DB-specific or product-specific language.

(PRQL dev here)

phlakaton · 2 years ago
As evidenced from the poster's ORM rant, two languages for the same DB seem just as offensive to them as different languages for different DBs. I suspect you're still going to have an uphill battle converting them, unless they were hit with a blinding shaft of sunlight on the road to the SQL History Museum or something.
samwillis · 2 years ago
Previous Show HN: https://news.ycombinator.com/item?id=31897430

Show HN: PRQL 0.2 – a better SQL - 378 points by maximilianroos on June 27, 2022, 161 comments

Original post when it was conceived: https://news.ycombinator.com/item?id=30060784

PRQL – A proposal for a better SQL - 650 points by maximilianroos on Jan 24, 2022, 295 comments