Select * from cloud - Readit News

Since starting my career, the tech stack I use has changed numerous times. Technologies have come, gone, evolved. But except for a brief period in 2010 where we thought we could kill it, SQL is the one technology that has stayed around. It might be the only TECHNICAL knowledge I learned in the first year of my career that is still relavent. You're probably not making the wrong decision if you decide to build on top of SQL.

strbean · 3 years ago

> except for a brief period in 2010 where we thought we could kill it

Unfortunately, we never actually tried to kill SQL. We tried to kill relational databases, with the death of SQL as a side effect. This proved to be a stupid idea - non-relational data stores have value of course, but so do relational databases.

SQL, as you point out, is 48 years old. It shows. The ergonomics are awful. It uses natural-language inspired syntax that has fallen completely out of favor (with good reason imho). The ordering of fields and tables precludes the use of autocomplete. It uses orthogonal syntax for identical constructions, i.e.

    INSERT ... (col1, col2, ...) VALUES (val1, val2, ...)

vs.

    UPDATE ... col1=val1, col2=val2, ...

And worst of all, it comes with such a woefully inadequate standard that half of the universally supported features are expressed in a completely different fashion for each vendor.

It is very common to argue that its longevity is due to its excellence as a language. I think it's longevity is due to its broad target audience, which includes plenty of people in less technical roles who are high on the org chart. Tech workers are happy to jump on new technology that improves ergonomics (or shiny new toys that are strictly more interesting because they are new). Executives who's primary way of interacting with the tech stack is querying a read only replica are much less open to learning a new language.

levkk · 3 years ago

SQL expresses relational algebra [0] which is the language defining data relationships and manipulation. It hasn't changed in 50 years for the same reason why B-Trees and sorting algorithms haven't: it's foundational to computer science.

Database-specific "SQL extensions" are in my experience just administration commands, e.g. `VACUUM` or `CREATE EXTENSION` in Postgres. They help operate the DB, but have little to do with the actual data manipulation.

Killing SQL is like trying to kill computer science: you better come up with something better than a synonym.

[0] https://en.wikipedia.org/wiki/Relational_algebra

lelanthran · 3 years ago

> It uses orthogonal syntax for identical constructions, i.e.

Regarding your complaint ... those aren't identical constructions.

Insert has to have the syntax it does because it lets you insert multiple rows all with different values in a single statement:

    INSERT ... (C1, C2) VALUES (v1, v2), (v3, v4), (v5, v6)

Update doesn't allow that. The syntax for each of the above makes sense for the specific intention of each statement.

dragonwriter · 3 years ago

> Unfortunately, we never actually tried to kill SQL. We tried to kill relational databases, with the death of SQL as a side effect.

I don’t know who “we” is, but, before NoSQL tried to replace RDBMS(well, starting before, these things overlapped it), there were efforts to provide alternatives to SQL for RDBMSs, some examples:

Query-by-Example. https://en.wikipedia.org/wiki/Query_by_Example

the D class of data languages. https://en.wikipedia.org/wiki/D_(data_language_specification...

mfashby · 3 years ago

No mention of PRQL yet, it's an effort to tackle some of these shortcomings

https://prql-lang.org/

sanderjd · 3 years ago

I think your last sentence gets at the crux of it. I think SQL is truly great for that use case of analytical queries over a read only database / data warehouse / parquet / whatever (ie. "OLAP" use cases). I think it's less great for the "OLTP" use cases. It still works fine, which is why it sticks around, but other approaches are more convenient for doing lots of smaller inserts, updates, and lookups and fewer table scans and aggregations.

omegalulw · 3 years ago

You are selling SQL way short. SQL has stuck around because, compared to its complexity, it lets you do incredibly powerful things.

jayd16 · 3 years ago

I also feel like part of the staying power of SQL is because it's the lingua franca of the database. I wonder if it's time to build and popularize some lower level language that could be targeted by ORMs and others to speak to DB engines. If we had a more flexible assembly language we might get more freedom at the programmer language level.

matwood · 3 years ago

I've been using SQL for ~25 years and don't think the ergonomics are nearly as bad as people say. In your example the insert is done that way for good reason.

I know people hate listing the fields before the tables, but when I deal with data that's how I tend to think. I need X, Y, Z, now where do I get it from and do I need to filter.

From a vendor standpoint, it can take time getting new standards implemented, but it's definitely not half. One of the big issues here is that MySQL was and still is in some ways woefully inadequate. It's shortcomings are also often make people think they are relational database shortcomings in general.

Aeolun · 3 years ago

> The ergonomics are awful. It uses natural-language inspired syntax that has fallen completely out of favor

I think SQL is so popular and has staying power for the same reason that our ridiculous gregorian calendar has staying power.

ako · 3 years ago

The reason we can't replace SQL is that it is the lingua franca between many client tools and data sources. From programming languages, to Analytics tools, to ETL tools to bulk data transfer tools, everything knows how to define SQL queries. On the datastorage side, almost everything is standardizing on SQL as its API: RDBMSes, but also NoSQL storage (etc, hadoop + hive, etc), data virtualization layers (denodo, aws athena, steam pipe).

You can't change this by providing a better query language in one of the datasource products, or by providing a better ORM. Because of the network effects it will be very hard to come up with something better that is universally adopted. Many have tried though but it's always limited to a small stack: MDX, DAX, Linq, Dplyr, Graphql, etc.

There may be opportunities to replace SQL as soon as we need something that moves away from relational algebra. Currently you see al lot of adoption on graph storage and graph query languages in the Data Fabric space, as users need to build queries reasoning about relationships between different datasets in the enterprise.

The other reason Data Fabrics could offer an opportunity here is that they're basically adding an layer of abstraction between all data sources and data consumers, and they the possibility to translate SQL into something else, e.g. graphql.

ahallock · 3 years ago

> SQL, as you point out, is 48 years old. It shows.

That's irrelevant and a fallacy. Plenty of amazing things are old and have stood the test of time. Vim and SQL have empowered me my entire career.

denysonique · 3 years ago

> It uses orthogonal syntax for identical constructions

There is an alternative INSERT syntax matching the UPDATE SET one:

  INSERT INTO table SET col1='val1', col2='val2';

xwolfi · 3 years ago

I disagree, the people high on the org chart are clamouring for cloud-based low structure stores on microsoft azure cloud maintained in India, while we warn them that we can migrate at a giant cost for a net loss in efficiency.

SQL and dbs are fine, they work, we could discuss beautifying sql for sure, but frankly we shouldnt solving data structuring problems that were already solved 50 years ago: we have real business to support instead.

aabbcc1241 · 3 years ago

The ergonomics issue of SQL can be workaround by transpiler.

I do like SQL the syntax but it's hard to discard the ecosystem around SQL.

For instance, I made better-sql [1] recently which generate SQL from a language similar to GraphQL and EdgeDB query language.

[1] https://better-sql.surge.sh

emmelaich · 3 years ago

I hope that those two syntaxes could be unified in some future version.

Also, surely allowing FROM and SELECT clauses to be swapped in order shouldn't stress out any parser too much.

mastersummoner · 3 years ago

When I was first learning to code, I once told someone that working with SQL felt like trying to wrestle with a mainframe, and I stand by that assertion.

codegeek · 3 years ago

To my primitive brain, NoSQL always sounded like "I am going to makeup data model and integrity on the fly and let the user figure out how to handle it". No Thanks. But may be I don't get it. GraphQL honestly gives me the same vibes. Happy to be corrected.

wyager · 3 years ago

I have plenty of beef with SQL, but your insert versus update example doesn't make sense to me. That's a perfectly reasonable difference; insert is always a dense operation while update is usually a sparse operation.

doctor_eval · 3 years ago

You can always do

    update t set (x, y) = (n, m);

At least you can in more recent versions of Postgres, and I think it’s standard SQL.

duxup · 3 years ago

Hey man if SQL is so great why are people always trying to kill it??!?!?

tiffanyh · 3 years ago

Declarative vs Imperative Language.

A huge benefit of SQL is that it's a declarative language (you describe what the output should be); whereas with imperative languages like C/Python/Ruby you have to describe how to generate the output with specific instructions and procedures.

jayd16 · 3 years ago

There's plenty of improvement to be had while still keeping it declarative.

jooz · 3 years ago

I consider SQL imperative. You say exactly how to do what you want, not what do you want.

If we take as example a table with a single row:

In imperative (SQL):

INSERT INTO TABLE1 (name, wage) VALUES ('John','50k');

Then one day John gets a raise:

UPDATE TABLE TABLE1 SET wage = '60k' where name = 'John'

In declarative (pseudo code in yaml):

- table: TABLE1 - name: John wage: 50k

Then one day John gets a raise:

- table: TABLE1

  - name: John

    wage: 60k

runeks · 3 years ago

Is there a formal specification of what “what” and “how” means in this context? I’m familiar with your description, but it appears subjective to me because the phrases “what the output should be” and “how to generate the output” are poorly defined.

411111111111111 · 3 years ago

You might notice that the only way to create a declarative language is to use an Imperative language.

Declarative languages are basically a DSL, which (hopefully) translate the desired steps into efficient instructions. Nonetheless, your cpu will execute imperative code at the end.

SQL is an example of a very well established and generally well done declarative language, but that doesn't mean that declarative languages are inherently better.

munificent · 3 years ago

I'm sure this will ruffle some feathres, but the technical skill that has been the most persistently valuable across the past 25 years of my career is object-oriented programming and design. It's not the only paradigm I use, but being able to organize large programs in terms of OOP has been incredibly valuable and hugely productive for me across several languages and domains.

kortex · 3 years ago

On the contrary, I think that's a self-fulfilling perception. If assets are already organized as objects, then it'll continue to work well with oop. But if you take the approach from the get-go to use data and FP approaches, things can be quite nice. Examples:

- interface/trait based programming / structural subtyping (widely used in go/rust, increasingly in TS and python)

- terraform

- react

- aws lambda / cloud functions

- flow-based data processing (the whole of deep learning, spark/hadoop)

- and of course, anything declarative DSL based (SQL, jq and friends)

So I would counter that the more valuable skill is "how do I solve problems in terms of applying and composing transforms to data"

To clarify, since everyone has their own definition of OOP, and of the four pillars, Abstraction, Polymorphism, aren't at all unique to OOP, and Encapsulation is just Abstraction: the defining features of OOP are inheritance and poking-and-prodding-state into opaque objects. Inheritance is subsumed by interfaces / structural subtyping, and poking at state is contrasted with reactor patterns, event sourcing, persistent data structures, etc.

Oop really shines at the middlin-low level, in languages without a lifetime (state for things like IO resourcese, at the GUI widget level, and the (micro)service level, which is more like the original smalltalk sort of objects, in which case inheritance isn't a think.

Jweb_Guru · 3 years ago

Everyone has a different experience! Object oriented programming and design have had approximately no impact on my own career, while SQL knowledge has been indispensable. I don't think they're really in competition.

pmarreck · 3 years ago

Tell me you've never worked on large FP codebases (as a point of comparison) without telling me you've never worked on large FP codebases ;)

Scarbutt · 3 years ago

I can see how can that be the case if one never works with products that rely heavily on having a database.

avgcorrection · 3 years ago

Just a slow-to-die 90’s fad in the end.

rubyist5eva · 3 years ago

I feel my abilities as a developer really took off when I took the time to really learn SQL and take advantage of the features that my database offers. For example, learning to use window functions felt like that galaxy brain meme where you become one with the universe.

Being able to take 50 lines of ruby code (manual (anti-)joins and aggregates, result partitioning, etc...) and replace it with a couple lines of sql that is much faster and less buggy is a life changing experience.

The only other time I had such a dramatic shift in the way that I look at building applications is when I learned Scala (functional programming).

johnthescott · 3 years ago

> replace it with a couple lines of sql that is much faster and less buggy is a life changing experience.

you have snatched the pea.

irrational · 3 years ago

I’ve been using HTML since the mid 1990s. That is probably the oldest technology that I still use on a regular basis. SQL would be a close second (or maybe third - I probably started learning sql around 2001 and I can’t remember when JavaScript came out. Surely before that.)

jrm4 · 3 years ago

As above has noted, Both HTML and SQL are effectively declarative, hence their staying power.

that_guy_iain · 3 years ago

Honestly, mines has mostly stayed the same. I've played with various techs but I mostly work the same underlying tech. Some items get added but very rarely does something get replaced or removed. The tech gets improved and new features added .

The last time I can remember something getting replaced is vargrant for docker. And honestly, I think that's a mistake for many companies. Before that it was git for svn.

The only way I can see tech coming and going is you're always an early adoptor and need to replace things that failed. If you build on stable tech, that stuff stays around for decades even if it was terrible in the first place. A good example is PHP. It was terrible at first still around and improving year on year.

marcrosoft · 3 years ago

Can you elaborate on why vagrant to docker is a mistake? I find the docker workflow infinitely better and faster to iterate for making dev environments.

arjvik · 3 years ago

Why do you think Vagrant->Docker was a mistake?

quickthrower2 · 3 years ago

The problem I see here is building SQL on top of the cloud, which is like building SQL on top of a database where the tables are always changing - not just adding columns but the meaning of the columns and rows, or even the existence of the table.

I think this is a good idea, but the cloud doesn't seem to care too much about backwards compatibility. If you use MySQL/PostgreSQL/SQL Server, you can be fairly sure in 20 years time your SQL will still work :-).

The need for this library is more an indictment on "the cloud" than anything. I am mostly an Azure user so AWS/Google might be better but man it changes it's UI and CLI interfaces alot - way too much.

papito · 3 years ago

SQL is a technology that engineers should know in an out, and while in early 2000's you could not move an inch without knowing it, I find that the new generation of devs really considers database calls a freebie - hidden behind abstractions and ORMs. It's just a line of code, right?

This leads to highly non-performant software since, unless we have bent the laws of physics in the last 15 years, your data access layer is the most expensive one.

But we are crushing LeetCode, yeah?

rsync · 3 years ago

"Since starting my career, the tech stack I use has changed numerous times. Technologies have come, gone, evolved. But except for a brief period in 2010 where we thought we could kill it, SQL is the one technology that has stayed around."

Ethernet comes to mind when I think of technologies that just can't be killed ...

matwood · 3 years ago

> It might be the only TECHNICAL knowledge I learned in the first year of my career that is still relavent.

Agree. IIRC, I wrote my first SQL is 1996 and wrote some today. While not perfect, it’s amazing at what it does. I suggest reading Joe Celko to really up your SQL skills.

johnthescott · 3 years ago

yep.

and i wrote my first quel in 1984 at BLI. and sql still won the war.

api · 3 years ago

That's not random. SQL is basically math. The syntax is a little old school and arcane (but not terrible) but the principles behind it are set theory and logic.

randomdata · 3 years ago

> That's not random.

It's kind of random. There were plenty of competing languages in use until the 90s. Even our beloved Postgres used QUEL for its first decade of life. But Oracle took the dominant lead and its competition either disappeared or started adding SQL support to be compatible with it.

Had a different butterfly flapped its wings all those years ago, things could have turned out very different.

> SQL is basically math.

All programming languages are basically math.

> but the principles behind it are set theory

Thing is, one of the challenges with SQL is that it doesn't adhere to set theory. This leads to some unintuitive traps that could have been avoided had it carefully stuck to the relational algebra.

Such deviation may have been a reasonable tradeoff to help implementations, perhaps, but from a pure language perspective it does make it more terrible than it should have been.

sleiben · 3 years ago

Reminds me of Git.

irrational · 3 years ago

Eh, I’ve seen various things like Git come and go. I spent the majority of my career in the days before git existed and we used things like SVN. I wouldn’t be surprised to see Git supplanted. I would be surprised to see SQL supplanted.

swalsh · 3 years ago

Git is a bit newer, and there have been many attempts to solve the same problem it solves, but it solves that specific problem VERY well... it'll be a long time before it's dethroned in my opinion.

shepherdjerred · 3 years ago

Git is surprisingly only 17 years old while SQL is 48 years. Not that you’re wrong, it’s just interesting how comparatively newer it is

TremendousJudge · 3 years ago

SQL is at least 30 years older than Git. Git has yet to prove its longevity in the same way.

Excited to see Steampipe shared here - thanks kiyanwang! I'm a lead on the project, so sharing some quick info below and happy to answer any questions.

Steampipe is open source [1] and uses Postgres foreign data wrappers under the hood [2]. We have 84+ plugins to SQL query AWS, GitHub, Slack, HN, etc [3]. Mods (written in HCL) provide dashboards as code and automated security & compliance benchmarks [3]. We'd love your help & feedback!

1 - https://github.com/turbot/steampipe 2 - https://steampipe.io/docs/develop/overview 3 - https://hub.steampipe.io/

davesque · 3 years ago

Hey Nathan. Can you comment on some of the known performance pitfalls of steampipe? I'm not super familiar with the Postgres foreign data wrappers API. I assume steampipe inherits many of its technical limitations from this API.

Having done some work in this space, I'm aware that it's no small thing to compile a high-level SQL statement that describes some analytics task to be executed on a dataset into a low-level program that will efficiently perform that task. This is especially true in the context of big data and distributed analytics. Also true if you're trying to blend different data sources that themselves might not have efficient query engines.

Would love to use this tool, but just curious about some of the details of its implementation.

nathanwallace · 3 years ago

The premise of virtual tables (Postgres FDW) is to not store the data but instead query it from the original source. So, the primary performance challenge is API throttling. Steampipe optimizes these API calls with smart caching, massive parallelization through goroutines and calling the minimum set of APIs to hydrate the exact columns requested. For example, "select instance_id from aws_ec2_instance" will do 2 API calls get 100 instances in 2 pages, while "select instance_id, tags from aws_ec2_instance" would do 2 calls (instance paging) + 100 tag API calls (one per instance). We've also implemented support for qualifiers (i.e. where clauses) so API calls can be reduced even further - e.g. get 1 EC2 instance without pagination etc.

The Postgres planner is not really optimized for foreign tables, but we can give it hints to indicate optimal paths. We've gradually ironed out many cases here in our FDW implementation particularly for joins etc.

If you can tolerate my Aussie accent, I explain many of the details in this Yow Data talk - https://www.youtube.com/watch?v=2BNzIU5SFaw

cube2222 · 3 years ago

To add somewhat of a counterpoint to the other response, I've tried the Steampipe CSV plugin and got 50x slower performance vs OctoSQL[0], which is itself 5x slower than something like DataFusion[1]. The CSV plugin doesn't contact any external API's so it should be a good benchmark of the plugin architecture, though it might just not be optimized yet.

That said, I don't imagine this ever being a bottleneck for the main use case of Steampipe - in that case I think the APIs themselves will always be the limiting part. But it does - potentially - speak to what you can expect if you'd like to extend your usage of Steampipe to more than just DevOps data.

I've used the benchmark available in the OctoSQL README.

[0]: https://github.com/cube2222/octosql

[1]: https://github.com/apache/arrow-datafusion

Disclaimer: author of OctoSQL

klysm · 3 years ago

Love to see Postgres FDW used. It’s a really powerful and imo not utilized as much as it could be.

cube2222 · 3 years ago

Hey! Steampipe looks great and I think the architecture you chose is very smart. Some feedback from myself:

I've tried setting up steampipe with metabase to do some dashboarding. However, I’ve found that it mostly exposes "configuration" parameters, so to say. I couldn't find dynamic info, like S3 bucket size or autoscaling group instance count.

Have I done something backwards or not noticed a table, or is that a design decision of some sort? That was half a year ago, so things might've changed since then, too.

nathanwallace · 3 years ago

Steampipe mostly exposes underlying APIs or data through SQL. So, "select * from aws_s3_bucket" will return bucket names, tags, policies, etc all together. But, bucket size is not a direct API so doesn't have a direct table.

In some high value cases we've added tables to simplify / abstract / normalize data - for example AWS IAM policies - https://steampipe.io/blog/normalizing-aws-iam-policies-for-a...

This example query returns the number of instances attached to an autoscaling group - https://hub.steampipe.io/plugins/turbot/aws/tables/aws_ec2_a...

BTW, we recently published a Metabase integration guide - https://steampipe.io/docs/cloud/integrations/metabase

petercooper · 3 years ago

It's a very cool project!

It might just be a coincidence, but an hour before this HN post, I discovered it way back in our queue of things to review for https://golangweekly.com/ and featured it in today's issue. Hopefully kiyanwang is one of our readers :-D

nathanwallace · 3 years ago

Awesome - thanks for the shout out in golangweekly!

aaronharnly · 3 years ago

Congratulations! This looks incredibly powerful and I'm excited to check it out.

Although this is pitched primarily as a "live" query tool, it feels like we could get the most value out of combining this with our existing warehouse, ELT, and BI toolchain.

Do you see people trying to do this, and any advice on approaches? For example, do folks perform joins in the BI layer? (Notably, Looker doesn't do that well.) Or do people just do bulk queries to basically treat your plugins as a Fivetran/Stitch competitor?

nathanwallace · 3 years ago

While it's extensible, the primary use case for Steampipe so far is to query DevOps data (e.g. AWS, GitHub, Kubernetes, Terraform files, etc) - so often they are working within the Steampipe DB itself and combining with CSV data etc.

But, because it's just Postgres, it can be integrated into many different data infrastructure strategies. Many users query Steampipe from standard BI tools, others use it to extract data into S3, it has also been connected with BigQuery - https://briansuk.medium.com/connecting-steampipe-with-google...

As opposed to a lake or a warehouse, we think of it as a "Data Rainbow" - structured, ephemeral queries on live API data. Because it doesn't have to import the data it works uniquely well for small, wide data and joining large data sets (e.g. search, querying logs). I spoke about this in detail at the Yow Data conference - https://www.youtube.com/watch?v=2BNzIU5SFaw

ezekg · 3 years ago

This is awesome. Makes me want to write a plugin for my SaaS, just because it looks fun.

nathanwallace · 3 years ago

I'm biased, but writing plugins is really fun - and then you can create mods / dashboards for your users as well! Please give it a go - we have a guide [1], 84+ open source plugins to copy [2], many authors in our Slack community [3] and we're happy to get your plugin added to the Steampipe Hub [4].

1 - https://steampipe.io/docs/develop/writing-plugins 2 - https://github.com/topics/steampipe-plugin 3 - https://steampipe.io/community/join 4 - https://hub.steampipe.io/plugins

breck · 3 years ago

This is amazing. Can't believe I hadn't seen it before. Nice job

VectorLock · 3 years ago

Are people usually setting up centralized shared instances of Steampipe or is it more of a "run on everyone's laptop" deployment preferred?

nathanwallace · 3 years ago

We see users running Steampipe on their desktop, in pipelines (e.g. to scan Terraform files) and also as a central service. See "service mode" in the CLI for a Postgres endpoint and a dashboard web server [1] or try docker / kubernetes [2]. We also have a cloud hosted option with dashboard hosting, multi-user support, etc [3].

1 - https://steampipe.io/docs/managing/service 2 - https://steampipe.io/docs/managing/containers 3 - https://steampipe.io/docs/cloud/overview

rmetzler · 3 years ago

My guess is, that it makes sense to get regular reports, e.g. weekly. But you also want to experiment and develop queries. So probably both. Not sure if there is something like a notebook for steampipe.

ComodoHacker · 3 years ago

What about joins, are they supported? Can't see them in the examples.

The real power of SQL is locked until you can join different data sources.

judell · 3 years ago

Here is one of our favorite examples: https://steampipe.io/blog/use-shodan-to-test-aws-public-ip

nathanwallace · 3 years ago

Yes - joins work across tables and schemas! This (toy) example connects IAM user records with Slack user data:

    select u.name, s.id, s.display_name
    from aws_iam_user as u, slack_user as s
    where u.name = s.email

chousuke · 3 years ago

Steampipe is pretty much "just" PostgreSQL with a foreign data wrapper and some fancy extras on top. The data is in tables from the database's perspective, so pretty much everything you can do with PostgreSQL, you can do with steampipe, including creating your own tables, views, functions and whatnot.

Lorin · 3 years ago

FYI hiring page requires login to notion

nathanwallace · 3 years ago

hmmm ... I just tested in Incognito mode and it's working for me. Could you please try again? We are hiring for multiple roles!