Readit News logoReadit News
DaiPlusPlus · 3 years ago
SQL does have a significant drawback w.r.t. how databases are used today (imo): a SELECT query can only return a single resultset of uniform tuples: if you want to query a database for hetereogenous types with differing multiplicity (i.e. an object-graph) then you either have to use multiple SELECT queries for each object-class - or use JOINs which will result in the Cartesian Explosion problem[1] which also results in redundant output data due to the multiplicity mismatch - SQL JOINs also lack the ability to error-out early if the JOIN matches an unexpected number of rows.

And there are often problems when using multiple SELECT queries in a batched statement: you can't re-use existing CTE queries. Not all client libraries support multiple result-sets. It's essentially impossible to return metadata associated with a resultset (T-SQL and TDS doesn't even support named result sets...), which means you can't opportunistically skip or omit a SELECT query in a batch because your client reader won't know how to parse/interpret an out-of-order resultset, and most importantly: you need to be careful w.r.t. transactions otherwise you'll run into concurrency issues if data changes between SELECT queries in the same batch ()

[1] https://learn.microsoft.com/en-us/ef/core/performance/effici... and https://learn.microsoft.com/en-us/ef/core/querying/single-sp...

nicoburns · 3 years ago
This is no longer true in database that have JSON support (which is most of them these days). You can aggregate the result of a subselect into a single column (the underling data doesn’t have to be stored as JSON, you can convert as part of the query)
erikpukinskis · 3 years ago
> You can aggregate the result of a subselect into a single column (the underling data doesn’t have to be stored as JSON, you can convert as part of the query)

For anyone (like me) who is not quite able to visualize this, here is an example:

    SELECT json_agg(trips)
    FROM (
        SELECT 
            json_agg(
                json_build_object(
                    'recorded_at', created_at, 
                    'latitude', latitude, 
                    'longitude', longitude
                )
            ) as trips
        FROM data_tracks
        GROUP by trip_log_id
    )s
From StackOverflow user S-man https://stackoverflow.com/a/53087015

tomnipotent · 3 years ago
Absolutely. JSON serialization is so well optimized in most RDBMS vendors now it puts very little additional CPU work on the database, and duplicating data in the output for large graph-based results is a small price to pay to breaking out of set-based constraints.
mborch · 3 years ago
JSON has pretty limited choice of data types though. Want better precision for numbers, use a string. Want dates? Use a string. Etc.
yencabulator · 3 years ago
The crap mini-language of Postgres functions to manipulate JSON makes otherwise reasonable queries unreadable, though.
masklinn · 3 years ago
That further increase the complexity of the queries, and then you start hitting weird corner-cases like postgres's difficulty (inability?) to convert a JSON array of JSON text elements to an array of text.
paulddraper · 3 years ago
But now you have to work in...shudder...JSON.

It's nice being able to use datetimes, 64 bit ints, binary, etc

Deleted Comment

DaiPlusPlus · 3 years ago
> This is no longer true in database that have JSON support

Doing that means losing foreign-key referential integrity...

gemstones · 3 years ago
Yup, this right here. This aspect of SQL is overwhelmingly why I insist on ORMs, too. Any efficiency gains you get by having a senior dev write raw SQL for a complex query are immediately negated by a junior turning what an ORM would write as a single query into three DB calls. All because SQL insists on a flat result set you have to turn into a nested collection yourself, without an ORM doing it for you with eager loading.
nine_k · 3 years ago
From my experience, ORMs in hands of junior.developers who happen to not yet know SQL are a disaster. However hard the ORMs may try, the code ends up making a ton of small queries instead of one efficient query, and fetching a ton of unused columns. The developers then end up doing joins manually in application code, some distance further from the place of the original queries.

ORMs also tend to sneak "live" objects into unexpected places, triggering surprise DB accesses.

Not that ORMs are completely useless. We just need to stop pretending that there can be a smooth and performant automatic mapping between relational tables living in a DBMS and Business Objects living in some idealized world without storage limitations, where any connection between them is like following a pointer in RAM.

Most ORMs provide tools to write composable, reusable queries and parts thereof. These are the best parts.

eastbound · 3 years ago
I’ve had a first-of-class linuxian excellent developer but junior, tell me that we need Kafka because our SQL requests took 3 seconds.

It should be a single INSERT, but through an ORM that multiplies it. The only upside of Kafka is not having the ORM…

preseinger · 3 years ago
"raw sql" is table stakes, core competency for every software engineer

ORMs 100% always do it worse

tlarkworthy · 3 years ago
I keep staring at the output of sqlalchemy's "select in" eager loader trying to figure out if has managed to pipline the follow up queries into a single DB round trip for the problem of tree shaped data. Still don't know, maybe someone does.

https://docs.sqlalchemy.org/en/14/orm/loading_relationships....

gozzoo · 3 years ago
Isn't your comment judging a fish by its ability to climb a tree?
post-it · 3 years ago
If the task is to climb a tree, it's reasonable to not hire a fish.
Psyladine · 3 years ago
You're asking those whose object problem model is a FOR loop not finding set based operations intuitive, efficient or ever necessary?
zapov · 3 years ago
Those are problems in MS Sql, certainly not in object relational DBs such as Postgres or Oracle. And its rather sad that instead of embracing that we ended up with Json as poor man replacement for such advanced usages. I guess non portability across DBs certainly doesnt help. I tried showing that 10 years ago: https://github.com/ngs-doo/revenj/ but it just resulted in confusion.
nurettin · 3 years ago
You don't really need multiple result sets, you can do lateral joins (cross apply in microsoftish) to bring in any unrelated data and distinct the result. And fancyql doesn't provide any alternatives, so it makes no sense to even bring this up.
jrumbut · 3 years ago
I think of SQL as one of the few good things we have in software development, so like the author I consider it best to try to do as much in SQL as possible.

It's not too uncommon I run into code in other languages where I just don't understand what it does, or to write code myself that behaves in ways that surprise me. That almost never happens in SQL.

Even a big hairball of a query just takes time to figure out (unless a database-specific function with odd behavior is used).

nickpeterson · 3 years ago
I tell new developers that SQL is one of those few things in our field you get to keep forever.

That JavaScript framework that takes a year to understand will no longer be used in 7 years. SQL is going to be here forever and learning it is useful your whole career.

Other common entries on this list of forever tools: regular expressions, emacs, bash/shell scripting, excel, probably more I’m forgetting.

Devs always push back on the suggestion to learn something like excel (actually learning it, not just clicking around), but when they first start they really underestimate how often it’s the tool the business speaks and feels comfortable giving feedback on technical questions in.

sp33der89 · 3 years ago
> that JavaScript framework that takes a year to understand will no longer be used in 7 years.

Surely there is a lot more nuance for having a nicer database query language than comparing it to JS frontend practices.

SQL is something that'll stick for a long time, but I support attempts from people that don't want to let SQL be the endgame.

Of course don't go around deploying highly experimental shiny things on production! :P

YeGoblynQueenne · 3 years ago
>> Other common entries on this list of forever tools: regular expressions, emacs, bash/shell scripting, excel, probably more I’m forgetting.

Oh yes, many more: Java, JCL & COBOL, Windows Forms, PhP, Wordpress, Joomla, Drupal, Perl, Visual Basic, IIS, SSIS, Ruby on Rails, and so on and so forth.

Maintaining enterprise software is a special circle of hell reserved for developers. Just sayin'.

waboremo · 3 years ago
The problem is that a lot of developers consider things forever tools when they are not forever tools and not even close. SQL is one of the forever tools, but even in your short list I would absolutely strike down two of them as "forever tools". They're more like "my favorites that I've invested decades into".
cratermoon · 3 years ago
> how often it’s the tool the business speaks and feels comfortable giving feedback on technical questions in.

Yes, but as often as not business people end up using Excel as a glorified text editor with built-in tabular formatting. Some of the spreadsheets I've been handed by project stakeholders would make 90s-era pre-CSS HTML using tables for formatting look clean and simple by comparison.

After writing applications in "4GL" sql tools early in my career, I've spent the entirety of the rest of my career trying to avoid writing SQL or having anything to do with the inner guts of those gigantic global variables known as relational databases.

thr0waway001 · 3 years ago
> That JavaScript framework that takes a year to understand will no longer be used in 7 years. SQL is going to be here forever and learning it is useful your whole career.

Nothing seems more ephemeral than JavaScript framework du jour.

lofaszvanitt · 3 years ago
I tell new developers that SQL is one of those few things in our field you get to keep forever. --- I'm not so sure anymore. Look what is happening with css... every simple thing will be killed and replaced with something complex, so you could achieve less with much more effort.
LunicLynx · 3 years ago
The main issue with sql is, that it is the wrong way around, which eliminates all tooling support.

You need to state what you want (select a, b, c) before you tell it from where to get it (from). And no tooling can predict that.

So switching this, moving from and joins in front of select, might be everything needed to fix sql.

mabbo · 3 years ago
I agree with you and think this backwards model leads to developers having a poor mental model of what they are doing.

Step 1: build the dataset you want (FROM and JOIN) with all columns.

Step 2: filter out the rows you don't want (WHERE).

Step 3: choose which columns/values you want (SELECT).

Maybe it's just me but this model makes so much more sense to me! I'm sure not every developer has the same way of thinking, but it sure does seem more logical to me.

Deleted Comment

tester756 · 3 years ago
C#'s LINQ (query syntax, not methods) got it right

var result = from s in stringList where s.Contains("Tutorials") select s;

ledgerdev · 3 years ago
I recall Anders saying on some podcast they did that so they could provide auto-complete. Funny thing is that I used it for a couple years, but almost never use the linq syntax any more, and not sure why.
DaiPlusPlus · 3 years ago
or just `var result = stringList.Where( s => s.Contains("Tutorials") )`

I can't stand the non-extension-method Linq syntax: the _only_ place where it offers a readability improvement over ext-methods is using `join` - but I hardly ever do that in Linq anyway.

Also, in both my code and yours, `result` will be a lazy-evaluated `IEnumerable<String>` which may be undesirable - which means it's probably a good idea to use `.ToList()` to materialize it - which means having to use ext-methods anyway - and mixing both syntaxes in the same expression is aesthetically atrocious.

turboponyy · 3 years ago
The work was already done for them, as LINQ is just do-notation from Haskell.
jmartrican · 3 years ago
I'm so jealous that Java does not have its own LINQ.
christophilus · 3 years ago
LINQ and proper reflection are what I miss most from C#. (I’m all Typescript at the moment.)
Luctct · 3 years ago
Tcl dicts do something very similar, and Tcl is very well integrated with SQLite.
mcdonje · 3 years ago
I get around this when hand coding by doing a quick 'SELECT * FROM', adding my joins, then going back and filling in the fields using intellisense. Intellisense doesn't know what columns are available until the FROM clause is handled.

To me, this is a slight annoyance with an easy workaround. If the SQL spec gets updated to allow switching the clauses around, I'll be pleased. But I'm not about to change languages over it.

cfiggers · 3 years ago
That's exactly what I do too. 'SELECT * FROM', then write the rest of the query, then double back and replace the * at the end.

Is it ideal? No. But it's muscle memory now, so...

¯\_(ツ)_/¯

WuxiFingerHold · 3 years ago
Oh, that's a great idea. What editor/ui do you use? Since I've got divorced from Jetbrains I miss Datagrip ...
ako · 3 years ago
Always start with the end in mind, first what your goal is then how to achieve it.

Also, i don't actually see the problem because you never write a query in a lineair way. Usually start with "select * from table limit 10", look at the columns and data available, and then start refining. By now, code completion works as the table is known. Wouldn't help much to write it table first.

idoubtit · 3 years ago
> Usually start with "select * from table limit 10", look at the columns and data available, and then start refining.

An experienced person won't do that. For any moderately complex SQL query, before writing it I already have in mind the several jointures I'll need, since I usually know the tables and FK I'm working with. It's like following the edges of a graph, all in my head. But I don't know all the fields of these tables, so I rarely write their names from memory. So I have to write "SELECT 1 FROM …" and then go back to that "1" once my FROM is complete. That's not the end of the world, but it does smell.

LunicLynx · 3 years ago
Exactly, you could actually drop the `select` in this case and just say `from table limit 10` don't state what you don't need.

What you describe is learned behavior to get along with a design flaw. SQL won't change, so no reason to worry.

My point is: People keep creating new versions of it, because it is not as `easy` to work with as it could be.

danielheath · 3 years ago
This is also my primary issue with vi.

d3w (`d`elete `3 w`ords) cannot be highlighted / indicated in any way ahead of time. If the motion specifier came first, it could be.

mmh0000 · 3 years ago
I know you specifically mentioned 'vi', but I'm going to blindly assume you meant 'vim'. In which case, as others already pointed out, you can use visual model to get exactly that behavior.

A good mindset to have in regards to Vim is, "Vim can do anything, even make you coffee". The trick with Vim is actually figuring out /how/ to do it.

I highly recommend you start with the VimCasts[1] video series. They're short, 5 minute, videos. With each covering a specific functionality of Vim. They straight to the point, and the author provides samples code for all videos.

[1] http://vimcasts.org/episodes/page/8/

Olreich · 3 years ago
`3w` is a command all its own. There’s an implicit move command baked in. Moving gets pretty clunky if you don’t have first order movement (adding a specifier or something to clear selection). You can mimic this by mapping a single button to <C-v> and disabling all movement commands in normal mode. It’s rough, but may be learnable.

I think something that might work better is adding a “commit” signal to operations. So you type `d3w` and the editor highlights the next three words with a strike through or red or whatnot. Then you can hit enter to commit the delete or escape to cancel it (cursor resets to start position, highlight goes away).

Fissionary · 3 years ago
You might like kakoune (https://github.com/mawww/kakoune), which does exactly that: first you select the range (which can even be disjoint, e.g. all words matching a regex), then you operate on it. By default, the selected range is the character under cursor, and multiple cursors work out of the box.

It's also generally lean and follows the Unix philosophy, e.g. by using shell script, pipes, and built-in Unix utilities to do complex operations, rather than inventing a new language (vimscript) for it.

(Not affiliated with the creator, but kakoune has been my daily driver for years now.)

wryanzimmerman · 3 years ago
If you want to highlight something, use `v` for "visual" mode
xyzzy_plugh · 3 years ago
v3wd?
n_e · 3 years ago
In most tooling, you can write SELECT FROM table, then go back to the select list and have autocomplete work.

The situation could certainly be better, but at least this works today.

madsbuch · 3 years ago
This is definitely something that makes tooling harder. But I think it is because tooling is though up wrongly.

Consider `SELECT 0 AS some_num`. This does not have a FROM clause.

While this example seems contrived, there are several examples of queries where the FROM clause is not just a listing of tables. Especially when moving into larger projects in SQL.

mrjin · 3 years ago
Almost exactly what I was going to say, then saw your comments. To me that the biggest problem. What I have been doing to workaround this is to just write select *, then finish the from join etc, go back and fix the result columns.
XorNot · 3 years ago
FROM users SELECT name, id, location WHERE name LIKE 'a%';

You know I think you're right.

frogulis · 3 years ago
Go further: `FROM users WHERE name LIKE 'a%' SELECT name, id, location` - after all, you can WHERE on things that aren't projected by SELECT
ellisv · 3 years ago
Eh. I’m not convinced by this. I may know what I want to query as often or more than knowing where it comes from.

I suppose it could be nice if the user could specify clauses in an arbitrary order but it’d certainly add complexity.

I don’t find it difficult to jump around a bit from clause to clause while writing a query. In fact, it’s incredibly rare to write a query straight through and have it do what you want it to do.

user3939382 · 3 years ago
There was an article on HN about this maybe 6 months ago. I’m in agreement.
6510 · 3 years ago
O well, if we are going to be like that?:

FROM users WHERE name LIKE 'a%' SELECT name, id, location

(I should refresh the page before posting)

blipvert · 3 years ago
I might be wrong, but there’s nothing to stop a front end UI from accepting this syntax and then writing out canonical SQL to do the actual query, is there?
wood_spirit · 3 years ago
Yeap, simply supporting FROM … SELECT … WHERE … would make sql much more autocompleteable?
smitty1e · 3 years ago
The query is just a requirements statement.

After parsing and analysis steps, the system is going to do what it does with the statement, no?

The syntax is for the user, not the system.

psnehanshu · 3 years ago
Sure, but the parent is speaking about how it is difficult for tooling to present with options when it doesn't know which table to select from.
yencabulator · 3 years ago
Since you express that opinion, I trust you're already aware of this, but just in case: https://prql-lang.org/

Deleted Comment

iLoveOncall · 3 years ago
This is a strength, not a weakness. It forces you to only select what you actually need, and state where it comes from afterwards, to only limit your query to exactly what you need.

This is paramount for performance.

If autocompletion is your issue, just get a better client, it's perfectly possible to autocomplete field names even before specifying database or table name.

daxvena · 3 years ago
I think you're missing what they're trying to say. You can still select what you actually need in a different order, but changing the order gives more immediate feedback from autocomplete. The order has pretty much zero impact on the performance of a query. A parser would still have to read out the whole query, and it's not expensive to unravel into a more efficient implementation if it would really help.

The problem is that SQL forces you to think about what to select before you even say where you're selecting from. There's nothing a client can do to recommend columns if it doesn't know where you're selecting from. It's pretty cumbersome to have to SELECT * FROM x and then go back and erase the * to actually get auto-completions.

It basically forces you to tell it what you want before you even know what the options are.

croes · 3 years ago
>eliminates all tooling support.

That's an exaggeration. Usually I just start with Select * and build all the necessary joins. For that the tooling support works without problems and when I select the columns in the end it works too.

camgunz · 3 years ago
Just start with `SELECT *` and sculpt w/ autocomplete afterwards as you like.
Etheryte · 3 years ago
While I agree with your point, this isn't really as big of a stopper for tooling as you make it seem. Many programming languages use something akin to import foo from bar and their tooling is just fine.
zaxomi · 3 years ago
If I could change one thing with sql, this would be it.

It makes more sense to have the select last.

jojobas · 3 years ago
Ain't broke.
kristiandupont · 3 years ago
GP clearly shows why it is. I think SQL is great but their point is spot on.
galaxyLogic · 3 years ago
The problem with SQL is that it is not a (very) composable language.

The documentation for EdgeDb goes into some detail about that and shows an alternative better language for data-queries.

https://www.edgedb.com/showcase/edgeql

To understand why SQL is bad, you must first be shown something better, and EdgeDb seems to be such better more composable language.

fbdab103 · 3 years ago
I like PRQL [0]. It fixes a lot of the SQL warts and compiles down to regular SQL (think Typescript to JS).

The syntax example on the PRQL homepage

  from invoices
  filter invoice_date >= @1970-01-16
  derive [
    transaction_fees = 0.8,
    income = total - transaction_fees
  ]
  filter income > 1
  group customer_id (
    aggregate [
      average total,
      sum_income = sum income,
      ct = count,
    ]
  )
  sort [-sum_income]
  take 10
  join c=customers [==customer_id]
  derive name = f"{c.last_name}, {c.first_name}"
  select [
    c.customer_id, name, sum_income
  ]
Trailing commas, select at the bottom, filter is both a WHERE and HAVING replacement, easy filtering of created columns, etc

[0] https://prql-lang.org/

ttfkam · 3 years ago
I'd much rather debug an SQL query with CTEs, which is what that is approximating.

The notion that SQL is not composable is either a lie or simply repeated by folks with only a cursory knowledge of SQL from 20 years ago.

Views, set-returning functions, CTEs, and more: all examples of composability in SQL.

Then of course there's the issue of security where folks tend to put all of their access constraints into their middleware AFTER the data has already been returned over the wire in bulk. Take a moment to consider role-based access control and row-level security policies. Instead of trying to track down every possible spot where a JOIN could have crept in past the code reviews (you code review your DDL and DML, right?), you set your GRANTs, REVOKEs, and POLICYs at the points in your data model that need them; restricted data never makes it into intermediate result sets let alone the final result set and the wire.

Don't misunderstand me. GRANT, REVOKE, and POLICY can be a real PITA, but that's because *security* is a PITA, not the SQL syntax to enforce it. Anyone who tells you their app solves your data security problems in the app tier with a point and click is a lying salesman.

scottcodie · 3 years ago
SQL in the context of a single query struggles with a composable features. But looking slightly outside the scope of queries with assignment statements and view/materialization dags then you start getting some of that composition back. I think SQRL is an good example of SQL with a bunch of composable feature bolt-ons.

https://www.datasqrl.com/blog/sqrl-high-level-data-language-...

nalgeon · 3 years ago
SQL is infinitely composable. Each SELECT returns a relation that another SELECT can query (or combine with another relation using set operators like UNION etc).
default-kramer · 3 years ago
I suppose it's infinitely composable in that very limited dimension, but SQL's critics are asking for composability in other dimensions. For example, say I have a pretty long query of some Order table. Now I want the exact same query, but it should start from the OrderArchive table instead. How do you do this? Dynamic SQL? The world has lambasted Javascript for much less, but somehow the fact that so many tasks require dynamic SQL (or copy-paste) is considered acceptable.

For the record, I make heavy use of SQL because relational databases are awesome and SQL is the least bad option I've found so far. But the many shortcomings of SQL still annoy me.

interleave · 3 years ago
> The problem with SQL is that it is not a (very) composable language.

I thought the same until a few weeks ago. Then we used the WITH operator for pre-processing and giving things human-readable names.

That helped us manage complexity. The final SELECT statement was very easy to reason about.

Not sure if this is a best (or worst) practice but it helped us ship it.

IanCal · 3 years ago
Malloy is another thing to check out in this space

https://www.malloydata.dev/

cratermoon · 3 years ago
Interestingly, EdgeDB is exactly the unnamed tool that the article criticizes.
OliverJones · 3 years ago
As an experienced (===old) developer, I have learned that data long outlasts the programs that access it. The lifetime of data is measured in decades, but programs last for years. Most SQL-based RDBMS teams have figured out workable version migration paths allowing old data to run on newer servers. Because this kind of migration is a very common and economically valuable operation, the vendors make sure it works correctly.

Sometimes a project, especially a greenfield project, looks like it will benefit from more recently invented data storage and query tech than your grandmother's SQL. That's always possible. And as developers we hope for, and work for, continued progress. But consider what may happen when the project succeeds.

If you're still on the project, you'll wake up one day and realize your oldest data is 20 years old. What happens if your storage and query engines are also 20 years old, because they didn't succeed to the extent needed to pay for maintenance and upgrades? You'll be in the software equivalent of the century-old subway system where you have to make all your replacement parts yourself, or get gouged by vendors that can't spread their costs among many customers.

Build for the ages, not for the moment!

RyanHamilton · 3 years ago
Show me an elegant SQL version for the queries in this article: https://www.timestored.com/b/kdb-qsql-query-vs-sql/ Particularly when you are trying to run queries where order matters, e.g. top 3 posters by topic on HN. You will find it much more annoying. Fundamentally SQL is based on the concept of tuples/sets which have no order so there's no way to avoid it being messy. What you want is a database based on the concept of an ordered list, suddenly what is complex in standard set SQL becomes easy in almost any other language. My second big complaint would be that SQL isn't really a programming language. Parts have been bolted on by various vendors or they now let you run python/java on the SQL server but considering how heavy SQL already is, having a full blown language may actually be less cognitive load than learning all the sub variations of language implementations.
iLoveOncall · 3 years ago
It's easy to cherry pick. I guarantee you there are a lot more queries that are easier to write in SQL than in your favorite FancySQL (or even worse, NoSQL) variation.
dan-robertson · 3 years ago
But one doesn’t write arbitrary queries. It is very easy to end up frequently wanting to write the kinds of analytic queries described in the GP rather than the kinds of things which SQL expressed better (which you fail to describe).

People pay a lot of money for kdb so clearly they see some value in it despite the lack of sql.

rak1507 · 3 years ago
There aren't. Feel free to try to come up with something. 'SQL' is pretty feature light (without specific extensions). Qsql is really great, it's a shame it's locked behind a proprietary language.
dan-robertson · 3 years ago
I don’t really find the examples convincing. Like, I get that sql could maybe be written in a slightly less horrid way but I would prefer something a lot less horrid.

I think I’m much more motivated by analytics queries than the kinds of thing in this example though. I find sql is poorly suited in this case because it is verbose and written backwards, and often requires many layers of subqueries. That said, one can usually still express queries in SQL that other systems do not allow.

For these kinds of queries I think there are just better ways to express them. Another issue with sql is that has some quite strange semantics.[1]

An example query I wrote yesterday is:

  select group, min, max, (max-min)/1e9 range
  from
    (select group, min(size) min, max(size) max
     from
       (select time, instance, sum(size) size, regexp_replace(name,…) group
        from X
        group by regexp_replace(name,…), time, instance)
     group by group)
  order by range desc 
  limit 10
Which is neither pleasant to write nor iterate on interactively.

With something like dplyr instead:

  X %>% mutate(group=regexp_replace(name,…))
    %>% group_by(group,time,instance)
    %>% summarize(size=sum(size))
    %>% group_by(group)
    %>% summarize(min=min(size),max=max(size),range=(min-max)/1e9)
    %>% arrange(-range)
    %>% head(n=10)
And that can be built up interactively pretty easily by adding onto the end of the pipeline.

I would also note that, due to sql being painful, the query is not exactly the one I wanted and instead I would have wanted something better capturing the change over time, but the thought of doing that in SQL seemed too unpleasant.

An example of an actual query language that tries to be better for analytics: https://prql-lang.org/

[1] from someone who spent a lot of time working on databases and sql: https://www.scattered-thoughts.net/writing/against-sql and just on semantics: https://www.scattered-thoughts.net/writing/select-wat-from-s...

nalgeon · 3 years ago
I think the problem here is with the query, not the language. You can immediately improve its maintainability and readability by using CTEs.

https://antonz.org/cte/

rjbwork · 3 years ago
This exactly the comment I was going to leave. If I get past one sub query, I will refactor to CTE.
dan-robertson · 3 years ago
Maintainability is not relevant to ad-hoc analytics queries. That may be dealt with once the correct query has been determined from sufficient iteration.
markisus · 3 years ago
The author’s second example is extremely unconvincing for me. Why would I want to be forced to use a SELECT expression to calculate a mean? Relational algebra is a great abstraction. SQL, however, seems to be poorly thought out and ad hoc. It’s just the first implementation of a relational algebra language that worked. But why should we be stuck with it forever?
reese_john · 3 years ago
You could probably use window functions

Ex:

  min(sum(size)) over(partition by group) min,
  max(sum(size)) over(partition by group) max

throwaway2990 · 3 years ago
You can’t mix/max a summed group? The sum would be in the same group level as min max…
dan-robertson · 3 years ago
I corrected my example (the inner group was meant to be by more columns)
croes · 3 years ago
Why do you need the outermost query?
dan-robertson · 3 years ago
I think I had originally written something like select min(size) min, max(size) max, max-min range, but that didn’t work as the newly introduced names ‘weren’t in scope’ and I didn’t want to type those aggregations out again. You’re right that it could have been avoided.
bob1029 · 3 years ago
SQL is only ever as good as the schema relative to the business or problem domain. The focus on the syntax of the language was always a mystery to me. It's a domain-specific language. It's up to you to make it not suck.

If you are forced to work with a schema that is poorly-aligned with the logical reality it intends to represent, you would definitely walk away with a bad taste in your mouth. Hacking around bad normalization is 99% of what makes SQL suck for me.

If you ever get a chance to design the whole thing yourself from zero, you should almost always insist on one big database/schema and routinely review the table structure with the business owners before you actually go to prod.

The moment you start doing things like putting data for service A into database A and service B into database B, you lose a lot of power. Sometimes this is required, but most of the time it's an org-chart alignment meme. There are ways to join these separate databases, but it starts to fall down pretty quickly. The true magic of SQL is having all of those dimensions in one place at one moment in time so you can put a pin in anything without complex distributed transactions.

Jenk · 3 years ago
The "domain" in the DSL of SQL is "relational data" not "your business domain"
bob1029 · 3 years ago
What does this "relational data" (hopefully) represent?