PRQL – A proposal for a better SQL

> SQL null handling rules are terrible. I understand them, I work with them, but at > the same time, they are so different from other languages concept of "null" that > they are easy to trip over.

from employees where country = "USA" derive [ gross_salary: salary + payroll_tax, gross_cost: gross_salary + benefits_cost ] where gross_cost > 0 group by:[title, country] [ average salary, sum salary, average gross_salary, sum gross_salary, average gross_cost, sum_gross_cost: sum gross_cost, count, ] order by:sum_gross_cost where count > 200 limit 20

SELECT TOP 20 title, country, AVG(salary) AS average_salary, SUM(salary) AS sum_salary, AVG(gross_salary) AS average_gross_salary, SUM(gross_salary) AS sum_gross_salary, AVG(gross_cost) AS average_gross_cost, SUM(gross_cost) AS sum_gross_cost, COUNT(*) as count FROM ( SELECT title, country, salary, (salary + payroll_tax) AS gross_salary, (salary + payroll_tax + healthcare_cost) AS gross_cost FROM employees WHERE country = 'USA' ) emp WHERE gross_cost > 0 GROUP BY title, country ORDER BY sum_gross_cost HAVING count > 200

with usa_employees as ( SELECT title, country, salary, (salary + payroll_tax) AS gross_salary, (salary + payroll_tax + healthcare_cost) AS gross_cost FROM employees WHERE country = 'USA' AND (salary + payroll_tax + healthcare_cost) > 0 ) select title, country, AVG(salary) AS average_salary, SUM(salary) AS sum_salary, AVG(gross_salary) AS average_gross_salary, SUM(gross_salary) AS sum_gross_salary, AVG(gross_cost) AS average_gross_cost, SUM(gross_cost) AS sum_gross_cost, COUNT(*) as emp_count from usa_employees group by title, country having count(*) > 200 order by sum_gross_cost limit 3

select title, country, avg(salary) as average_salary, sum(salary) as sum_salary, avg(gross_salary) as average_gross_salary, sum(gross_salary) as sum_gross_salary, avg(gross_cost) as average_gross_cost, sum(gross_cost) as sum_gross_cost, count(*) as emp_count from employees, lateral ( select (salary + payroll_tax) as gross_salary, (salary + payroll_tax + healthcare_cost) as gross_cost ) employee_ext where country = 'usa' and gross_cost > 0 group by title, country having count(*) > 200 order by sum_gross_cost limit 3;

select top(20) title, country, ... avg(gross_salary) as average_gross_salary, ... from employees cross apply ( select gross_salary = employees.salary + employees.payroll_tax, -- or "as .." gross_cost = ... ) v -- some name required but don't need to use it if column names are unique where ...

I wrote this over the holidays, because I find SQL wonderfully elegant in its function, but really frustrating in its form.

Let me know any feedback — as you can see it's still at the proposal stage. If it gains some traction I'll write an implementation.

xi · 4 years ago

Maybe you'd like to check FunSQL.jl, my library for compositional construction of SQL queries. It also follows algebraic approach and covers many analytical features of SQL including aggregates/window functions, recursive queries and correlated subqueries/lateral joins. One thing where it differs from dlpyr and similar packages is how it separates aggregation from grouping (by modeling GROUP BY with a universal aggregate function).

maximilianroos · 4 years ago

This is awesome! I'll add a link to it on PRQL.

I guess the biggest difference between FunSQL (and similarly dbplyr) and PRQL is that the former needs a Julia (or R) runtime to run.

I really respect the library and keen to see how it develops.

olau · 4 years ago

I like the explicit pipelining idea, seems much easier to reason about. Some comments:

I found the "# `|` can be used rather than newlines." a bit odd. So when using let, you're only transforming one column? I think the example would look weird with returns instead of |.

Depending on your intended target, it might help adoption if you stay closer to the naming conventions of that target. If you're targeting mainstream Java/Python/C#/Javascript etc. then functions need parentheses, "take 20" may be worse than slice, etc.

I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.

Also I think you should try writing one or more native wrappers in your intended target languages to make sure it's easy to interface between the two, even if it means you'd have to use dots in that language.

I could imagine an end game where the ergonomics were so good that a database like Postgres ends up with a native PRQL frontend. Not sure you're there yet, though. IMHO SQL as a query language suffers from a) sometimes really bad ergonomics, b) it's hard to wrap in another programming language (also ergonomics), c) it has far too many concepts - it's not orthogonal.

maximilianroos · 4 years ago

> I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.

Thanks good idea, I just changed this to remove the microversions. If we use SemVer, then before `1`, we'd hold versions compatible to the 0.X, and then to the X.

silvestrov · 4 years ago

"LIMIT 10" from PostgreSQL sounds nicer than "take 10" because "take" for me sounds like a random selection is made (take 10 from the bag).

Also want "OFFSET 20" from PG.

MarkLowenstein · 4 years ago

IMO you are at the forefront of where query languages need to and will go.

Some programmers like you see that SQL ordering is backwards to human thinking, except in the simplest cases. But many people with practice and sunk costs in their SQL expertise will be resistant. The resistance usually wins the day.

But sometimes, a useful tool gets created by one person, and a rift is created in that resistance. Think John Resig creating jQuery, leading to LINQ and many other similar patterns. You could be that person for database query languages, but how do you ensure that?

Maybe imagine what made jQuery easy to adopt and indispensable for programmers: easy availability as a simple .js download; solved the problem of DOM differences between browsers. Good luck to you, and thanks for sharing.

MarkLowenstein · 4 years ago

Also PRQL/Prequel is a great name. Just that can take you far.

roberto · 4 years ago

This looks great! Clear docs and rationale, and the syntax is well thought. I'm definitely following this.

... filter gross_cost > 0 aggregate by:[title, country] [ average salary, sum gross_salary, average gross_cost, let sum_gc = sum gross_cost, count, ] sort sum_gc

I'm quite opposed to the idea "from should be first".

I want to understand what exactly the query returns, not the implementation detail of the source of this data (that can later be changed).

Literally first example from page - I have no idea what is being returned:

    from employees
     filter country = "USA"                           # Each line transforms the previous result.
     let gross_salary = salary + payroll_tax          # This _adds_ a column / variable.
     let gross_cost   = gross_salary + benefits_cost  # Variables can use other variables.
     filter gross_cost > 0
     aggregate by:[title, country] [                  # `by` are the columns to group by.
          average salary,                              # These are the calcs to run on the groups.
          sum     salary,
          average gross_salary,
          sum     gross_salary,
          average gross_cost,
          sum     gross_cost,
          count,
 ]
     sort sum_gross_cost                              # Uses the auto-generated column name.
     filter count > 200
     take 20

of course, similar things are happening to SQL too, with CTEs becoming more widespread and "real" list of the columns hidden somewhere inside, but it's still parseable

phailhaus · 4 years ago

    SELECT id, name, author

Quick, what is this query about? What's ironic is that I think you have it backwards: the columns are the implementation detail, not the table. The table is the context: you can't change that without having to change everything else. But columns are the last step, the selection after the filters, joins, etc. They can be changed at any time without affecting the logic.

taeric · 4 years ago

This is... An odd choice. I'd assume I'm not without context looking at a query to know why I would want those columns.

And the auto complete story is backwards. Often I know what columns I want, but I'm not clear what table I need to get them from. Such that, if you make a smarter suggest in the from to only include tables that have the columns, I'd be much happier.

BeefWellington · 4 years ago

It's query asking for the id, name, and author fields. Very straightforward, I have no idea how this is confusing.

> The table is the context: you can't change that without having to change everything else.

Except even in the provided single-table example this isn't true - you're getting subselected/CTEd results. No functional joins are demonstrated unfortunately.

For example:

   from employees
   left_join positions [id=employee_id]

   ...is equivalent to...

   SELECT * FROM employees LEFT JOIN positions ON id = employee_id

No data is selected from positions in either example, and it's unclear on why we're joining that table (other than just for the heck of it). It's not a workable example.

inglor · 4 years ago

The big advantage of "from first" like we have in Kusto KQL (a database we use at Microsoft) is that it provides much better autocomplete (if I write the `from` it can easily autocomplete the projection).

If you want an interesting example of how a query language built for developer experience and autocompletions looks definitely check it out!.

lemmsjid · 4 years ago

That's interesting because it also explains why I was going to say I do like having from first. When trying to reason about a query, I mentally go through the following:

1. What tables are being pulled from? This speaks to the potential domain of the query. 2. What data is being selected (I can now know what is or isn't being pulled from the aforementioned tables...) 3. What operations, aggregations, groupings, etc. are being performed to work on the pulle data

Of course from vs select ordering is completely arguable, but my thinking process seems to follow that of the auto complete--in other words that my cognitive load of looking at the select statement is lessened when I know from what the columns are being selected.

It also follows (at least to me) the mental process of writing the query. First look at the tables, then decide what columns, then decide what functions to apply.

taeric · 4 years ago

I said it in a sibling, but I feel this is somewhat missed. Auto complete that simply lists the tables is easier if from is first. But... Auto complete that helps me know what tables can give me my requested columns works the other direction.

majkinetor · 4 years ago

Too bad we can't use Kusto with anything except Azure.

_jal · 4 years ago

Designing languages around autocomplete is like designing toilets for better toilet paper dispensers.

The language should be right for human understanding, not automated mad-lib generation.

kortex · 4 years ago

I think it's quite a common convention in engineering - not just software - that the input to a process "goes in the top and out the bottom". We humans read top->bottom (regardless of left/right/vertical, I don't know any languages that write bottom up). Conventional voltage in circuit diagrams usually flow top to bottom. Gravity loads in schematics flow top to bottom. Chemical pathways are usually written top to bottom. And of course functions take arguments up top and return at the bottom, maybe with some short circuits. I think the only counter example of note is distillation columns.

Where is the data coming from? Employees table. What's coming out? 20 rows of sum_gross_cost.

What could improve this is function signatures. It's kind of nice to have the whole abstraction up top...like an abstract.

quocanh · 4 years ago

I agree that the columns of the results should be more obvious. But I am a proponent of "from should be first". I have never written a SQL query without thinking about the contents of a table or its relations. If it was my way, I would describe where the data I'm pulling from, then describe any filters/joins, then describe the columns that I'm interested in (last).

BeefWellington · 4 years ago

You've never authored a SQL query that does things like check special functions that don't exist in a table?

For example:

    select @@version

barrkel · 4 years ago

It's a fair sentiment, but it can be handled without losing directional flow and composability, some of the bigger advantages of reworking SQL.

One idea would be along the lines of a function prototype: a declaration, up front, about the columns and types that a query is expected to return. It's a good place to put documentation, it's redundant information which should protect against mistakes but not so redundant that it would be too taxing - the author should know what the query returns. The prototype would only be used for validation of column names and types.

Another idea would be requiring the last element in a query to be a projection, a bit like the return statement in a function body: here's what I'm returning out of the grand set of symbols available (e.g. via various joins) in scope from previous operations in the flow.

samatman · 4 years ago

I'm also completely unfamiliar with the PRQL syntax, outside of right now.

Reading the comment however, it would seem that `let` adds columns which are implicitly returned in the order they are defined.

I do see benefits in this, and can imagine pitfalls. Hard to judge without kicking the tires.

Update: It's quite possible we saw different syntax!

https://news.ycombinator.com/item?id=30063266

Without the `let` I would imagine having trouble reading it as well, I'm not sure if that would go away with familiarity but my instinct is that it's a useful addition.

KerryJones · 4 years ago

This feels like a English-language thing. In english we tend to put our adjectives first, it feels natural, "Where is my red, round ball?", rather than some other languages (like German) where you put the subject first. Equivalent of "Where is my ball, red & round?"

While it inherently feels unnatural I do agree with the others here that the context is actually easier to understand once over the initial uncomfort.

zigzag312 · 4 years ago

_from_ is kind of one of the most important context about the data being returned. It provides the type information. Columns you select are just properties of that type.

In SQL, where _from_ is placed at the end, we are essentially writing equivalent of 'property.object'; eg.: name.person, age.person

mcsoft · 4 years ago

Both CTEs and this idea address the same problem: poor readability of complex SQL queries. Compared to CTEs, the author takes the idea to split the complex query into parts to the next level.

To your point - a solid IDE will show you what's being processed at each line (or returned, if the cursor is on the last line) - in an autocomplete window or a side panel.

ianbicking · 4 years ago

I like it, it's readable, unlike some SQL alternatives I've seen it doesn't make me feel like I'm dumb and don't understand what a query even is.

I can't decide if it would be better or worse if it stuck more closely to SQL keywords. You use "from" and "select", but not "where", "order by", "group by". There's some danger of it being in an uncanny valley of SQLish, but I'm pretty sure I'd prefer just using those terms verbatim (including the space in "order by"... that style is less common in modern languages but it's not really that much harder to parse).

I'd like to see more examples of joins and composing SQL. Does this language make it easier to make more general SQL queries? Can I take two queries and squash them together in a reliable way? I feel like I end up with a lot of theme and variation in my queries, often involving optional filters.

I might even like a notion of encapsulation that could help this query language when it's embedded in other languages. Like if I could say, in the language itself, that a query has certain unbound variables (and not just ? or other placeholders). This language seems like it would be better for generating than SQL, and sometimes generation is just necessary (like in any application that supports data exploration), but for most common cases I'd hope to avoid that. Defining inputs and then making whole filter sections or other statements conditional on those inputs would help here.

cogman10 · 4 years ago

Yup, I like a lot of things about the way this looks. In particular, I like how friendly this looks to be for things like auto complete (pretty annoying to need to practically type the entire sql query only to go back and fix up the columns in order to get autocomplete to work).

Specific things I'd like to see.

How do you handle column ambiguity. In the examples, they show a join of positions to employee on employee_id == id. But what happens when you have 2 columns with the same name that you are joining on? (like employee_id to employee_id in some mapping table).

Subqueries are pretty important in what I do, so what do those look like (perhaps covered by the "thinking about CTEs section").

How about opportunities for optimization hints? In T-SQL you can hint at which index the optimizer should prefer to a specific query.

Common SQL patterns would also be interesting. Like, how would you do keyset pagination?

Edit: Also, I'd like a discussion about null. SQL null handling rules are terrible. I understand them, I work with them, but at the same time, they are so different from other languages concept of "null" that they are easy to trip over.

dotancohen · 4 years ago

Could you elaborate? I'm really only versed in the MySQL accent, but I don't find anything unusual or unexpected about NULLS in MySQL. If there are any pitfalls that I should be aware of, I'd love to know about it here before my users start complaining about bugs.

Thanks.

Thanks!

I just fleshed out composing CTEs, which is a small step towards the broader goal of making composition easier: https://github.com/max-sixty/prql/commit/dc68fcaaceef26cc078...

Let me know if you have a good case of the sort of composition you find difficult in SQL (either here or in an issue). Thank you!

rkrzr · 4 years ago

I think supporting variables and functions already solves most of my composability gripes with SQL.

Another problem that I have with composing SQL is that large queries quickly become unreadable, and error messages are also often not terribly helpful. I think having a more expressive type system would help with the error messages. Do you have any plans on adding a type system to PRQL?

rswail · 4 years ago

> Can I take two queries and squash them together in a reliable way? I feel like I end up with a lot of theme and variation in my queries, often involving optional filters.

That's essentially what SQL views do. Each view is a query and then you can treat it like a table and filter/join on it.

Of course, then the problem becomes whether or not the query planner can see through the view to the underlying tables to optimize correctly.

ergest · 4 years ago

> I can't decide if it would be better or worse if it stuck more closely to SQL keywords. You use "from" and "select", but not "where", "order by", "group by". There's some danger of it being in an uncanny valley of SQLish, but I'm pretty sure I'd prefer just using those terms verbatim (including the space in "order by"... that style is less common in modern languages but it's not really that much harder to parse)

I agree 100% here. As a SQL veteran, it would make the transition a lot easier if you used common SQL keywords like group by, order by, limit, etc. e.g.

munk-a · 4 years ago

I like the flow direction compared to standard SQL. SQL is supposed to read like a sentence I suppose but I have many times looked at it and really wanted things to be in a more logical order.

My main suggestion would be to be a bit less terse and introduce a bit more firm formatting. I'm not a huge fan of the term "split" and feel like jazzing that up to "split over" or even just reviving "group by" would improve readability. Additionally the aliasing could use work, I'd suggest reversing the assignment to be something closer to `use salary + payroll_tax as gross salary`. In terms of firm formatting, unless I'm missing something there isn't any reason to allow a filter statement before any aliases - so you can force two fixed positions for filter clauses which would make it always legal to reference aliases in filters.

On the brief topic of form vs. flexibility. SQL is a thing that, when complex, is written by many people over the course of its lifetime - removing the ability to make bad decisions is better than enabling the ability to write simple things even simpler - those silly do nothing queries like "SELECT * FROM customer WHERE deleted='f'` are written once[1] in a moments time and never inspected again. The complex queries are what you want to optimize for.

1. If they even are - with ORMs available a lot of those dead simple queries just end up being done through an ORM.

hn_throwaway_99 · 4 years ago

> On the brief topic of form vs. flexibility. SQL is a thing that, when complex, is written by many people over the course of its lifetime - removing the ability to make bad decisions is better than enabling the ability to write simple things even simpler

Hallelujah! But, to your footnote, this is a major reason why I despise ORMs. In my mind they make writing simple code slightly easier, but they make complicated SQL statements, especially when you get some weird issue under load and you're trying to debug why your DB is falling over, a ton more difficult and you spend so much time just battling your ORM.

makeitdouble · 4 years ago

On ORMs, the best use I see of them is for “transparent” queries that you don’t define.

Like fetching a record by id, or a single record and all of its related properties. Or a list of all the record in a table matching a simple filter.

That’s 98% of what we do against the DB, and I’m all for having it basically invisible.

Then let’s just bypass the ORM altogether the minute we think about joining or grouping things together. There are libs in most language that help just sanitize queries, so it’s no difficult really.

jnsie · 4 years ago

I like the flow direction specifically for intellisense/autocomplete. I'm sure it would be easier to provide hints when the table name is known immediately.

mschaef · 4 years ago

This is exactly why LINQ uses a similar ordering.

m1sta_ · 4 years ago

I'd love for the next release of SQL to have optional alternative ordering of clauses

This is great feedback, and I agree with you re de-prioritizing terseness.

And I agree with you on both the assignments and `split` being a bit awkward. Kusto just uses `by`, WDYT?

tomtheelder · 4 years ago

Not the original commenter, but just using `by` makes total sense to me.

6gvONxR4sf7o · 4 years ago

Group by seems good enough and not changing terms where there isn’t good reason seems like a good goal. Is PRQL’s split the same as SQL’s group by?

loic-sharma · 4 years ago

Yes Kusto's `by` is excellent!

By actually sounds great to me to, yea. In this case it's short but it's extremely communicative!

magicalhippo · 4 years ago

Now this is actually nice, unlike the other suggestion posted today[1].

Maybe I'm just too used to non-standard extensions of our database but the SQL example could, at least for our db, be rewritten as

This cuts down the repetition a lot, and can also help the optimizer in certain cases. Could do another nesting to get rid of the HAVING if needed.

Still, think the PRQL looks very nice, especially with a "let" keyword as mentioned in another thread here.

[1]: https://news.ycombinator.com/item?id=30053860

ako · 4 years ago

With a CTE it would read a bit more like prql:

Readability is pretty similar to prql. It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

gunshai · 4 years ago

My brain just thinks in CTEs over sub queries. I really dislike that my co-workers use these ridiculously nested sub sub sub queries.

I just look at something like this and I immediately know what's going on. If it's nested sub queries it always takes me much longer.

cribwi · 4 years ago

In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like

dvasdekis · 4 years ago

Column aliases would have saved me hundreds of hours over the course of my career. Sorely missing from standard SQL, and would make the need for PRQL less acute.

correct-me-plz · 4 years ago

Snowflake lets you refer to column aliases, and it's great!

There's the slight issue of shadowing of table column names, which they resolve by preferring columns to aliases if both are named the same. So sometimes my aliases end up prefixed with underscores, but that's not a big deal.

> With a CTE

The DB we use supports those, I just learned about them too late so keep forgetting they exist :(

> It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

The DB we use supports that, so in your CTE you could write

   AND   gross_cost > 0

We do that all the time, which will be a pain now that we're migrating to a different DB server which doesn't.

gmfawcett · 4 years ago

Not all database systems can optimize queries well over CTE boundaries. I believe this is still true for PostgreSQL (no longer true, see below -- it was true a few years ago). So there's a potential performance hit for (the otherwise excellent advice of) writing with CTE's.

mmsimanga · 4 years ago

Sybase IQ allows you to use the column alias anywhere else in the query.

jsyolo · 4 years ago

what expressions are being repeated here?

dagss · 4 years ago

In Microsoft SQL cross apply can be used for this in even more situations and with less repetition:

tfehring · 4 years ago

Very cool! A couple questions/suggestions off the top of my head:

1. Did you consider using a keyword like `let` for column declarations, e.g. `let gross_salary = salary + payroll_tax` instead of just `gross_salary = salary + payroll_tax`? It's nice to be able to scan for keywords along the left side of the window, even if it's a bit more verbose.

2. How does it handle the pattern where you create two moderately complex CTEs or subqueries (maybe aggregated to different levels of granularity) and then join them to each other? I always found that pattern awkward to deal with in dplyr - you have to either assign one of the "subquery" results to a separate dataframe or parenthesize that logic in the middle of a bigger pipeline. Maybe table-returning functions would be a clean way to handle this?

> Did you consider using a keyword like `let` for column declarations

Yeah, the current design for that is not nice. Good point re the keyword scanning. I actually listed `let` as an option in the notes section. Kusto uses `extend`; dplyr uses `mutate`; pandas uses `assign`.

I opened an issue here: https://github.com/max-sixty/prql/issues/2

I've added the `let` keyword given a few people commented on this.

hadley · 4 years ago

acquero uses derive (https://uwdata.github.io/arquero/api/verbs#derive) which I rather like (it's better than mutate, IMO)

> 2. How does it handle the pattern where you create two moderately complex CTEs or subqueries (maybe aggregated to different levels of granularity) and then join them to each other? I always found that pattern awkward to deal with in dplyr - you have to either assign one of the "subquery" results to a separate dataframe or parenthesize that logic in the middle of a bigger pipeline. Maybe table-returning functions would be a clean way to handle this?

I don't have an example on the Readme, but I was thinking of something like (toy example):

  table newest_employees = (
    from employees
    sort tenure
    take 50
  )
  
  from newest_employees
  join salary [id]
  select [name, salary]

Or were you thinking something more sophisticated? I'm keen to get difficult examples!

Edit: formatting

mcdonje · 4 years ago

When you add in the ability to reference different tables like that to the piping syntax, it starts to remind me of the M query language: https://docs.microsoft.com/en-us/powerquery-m/quick-tour-of-...

There, each variable can be referenced by downstream steps. Generally, the prior step is referenced. Without table variables, your language implicitly pipes the most recent one. With table references, you can explicitly pipe any prior one. That way, you can reference multiple prior steps for a join step.

I haven't thought through that fully, so there may be gotchas in compiling such an approach down to SQL, but you can already do something similar in SQL CTEs anyway, so it should probably work.

twic · 4 years ago

My gut reaction is that if we have "from first" then maybe we should have to "to last":

  from employees
  sort tenure
  take 50
  as newest_employees
  
  from newest_employees
  join salary [id]
  select [name, salary]

hackeredje · 4 years ago

Is this not Linq ? https://stackoverflow.com/questions/tagged/linq ?

And then dump the queries via https://stackoverflow.com/questions/1412863/how-do-i-view-th... or https://www.linqpad.net/ ?

aloisdg · 4 years ago

Indeed. It looks a lot like dotnet's Linq.

cerved · 4 years ago

wr to linq to sql: the difference is linq works by making objects to tables and dotnet primitives to sql types, often producing really poor queries as a result

blintz · 4 years ago

This is a nice idea, especially given all the work people have done recently to make in-language querying nicer (Spark comes to mind).

My only gripe is the 'auto-generated' column names for aggregates. This seems like a recipe for disaster - what if there is already (as there almost certainly will be) named "sum_gross_cost"? The behavior also just seems rather unexpected and implicit. My suggestion would be simple syntax that lets you optionally give a name to a particular aggregate column:

While it might seem a little uglier, it seems much more sustainable in the long run. If this is really too gross, I'd advocate some token other than underscore that is reserved for aggregation variables; perhaps `sum@gross_cost` or `sum#gross_cost`.

yen223 · 4 years ago

> My only gripe is the 'auto-generated' column names for aggregates

For what it's worth, a similar problem already exists with SQL. Something simple like

  select count(*) from my_table;

automatically aliases the column to `count`, even if `my_table` has a column called `count`.

In practice, I don't think this is a major problem.

ziml77 · 4 years ago

Which database does that? MSSQL doesn't assign any name to the column in this case.

Definitely — giving the option of naming them is great.

I'm not sure whether we should force naming? When I'm writing a query often I'm fine with something auto-generated when starting out.

galkk · 4 years ago