PRQL as a DuckDB Extension

The post-modern data stack is going to be PRQL + DuckDB + Prefect, and it's going to be much smaller and cheaper for most analytics.

bonchicbongenre · 2 years ago

I'm with you at least 2/3 of the way. My preferred stack is PRQL + DuckDB + Dagster. I evaluated the space for work at my current company (was originally only DE, handling ingests from ~300 sources across various systems, on order of ~1k downstream tables in dbt + hundreds of dashboards + a handful of business-critical data+app features; now leading a small team).

I came away ranking dagster first, prefect second, everything else not close. IMO dagster wins fundamentally for data engineers bc it picks the right core abstraction (software defined assets) and builds everything else around that. Prefect for me is best for general non-data-specfic orchestration as a nearly transparent layer around existing scripts.

Ofc to each their own based on their usecase.

esafak · 2 years ago

Are you sure prefect is better than flyte? How so?

https://neptune.ai/blog/best-workflow-and-pipeline-orchestra...

cced · 2 years ago

Thoughts on dbt?

nerdponx · 2 years ago

Great idea, kind of a chaotic mess in practice. Better than nothing by far, but the industry I think will be eager to receive an improved alternative.

The problem with any tool like Dbt that abstracts over differences in databases is that a huge amount of work goes into building "adapters" to support the various details and quirks of each supported database. That ends up being a substantial technological moat which inhibits the growth of competitor systems. Another option is to do what Datasette did and focus on supporting one specific database, gradually expanding to a second database after years of demand for it.

Can someone tell me why PRQL is better? I went here: https://github.com/PRQL/prql

It looks nice, but what's the strengths compared to SQL?

snthpy · 2 years ago

Have a look at the online playground: https://prql-lang.org/playground/

It starts you off with a very well documented example. Try commenting out one of the lines and watch how the SQL on the RHS changes.

Each line is a separate transformation and follows a logical flow from top to bottom. IMHO it combines the best of SQL and pipelined DSLs like dplyr, LINQ, Kusto, to name just a few. An advantage over things like Pandas is that it still generates SQL so you can take your compute to where your data is (Cloud DWH) and benefit from query optimisation whereas Pandas has to download all your data first and then follows an eager execution model which doesn't benefit from query optimisation. Polars fixes a lot of data but still has the data transfer probleam and it is not universal whereas PRQL can compile to different dialects of SQL like DuckDB, Postgres, BigQuery, MS SQL Server, ...

For a more in-depth overview, watch this:

PRQL: A Modern Language for Data Transformation (11 min) https://youtu.be/t4-f9vjq2lc?si=Qx3k5oAq5A9THU0G

Disclaimer: I'm a PRQL contributor and the presenter of that talk.

klysm · 2 years ago

it sounds minor, but having `from` before `select` means you can get autocomplete

laerus · 2 years ago

Well, you can always type "select from some_table" and then move the cursor back after select and have auto completion. For example Jetbrains DataGrip supports this.

henrydark · 2 years ago

I get the sentiment, but personally I can easily imagine myself writing an autocompleter that would work fine with select before from. (I don't write much sql so I don't)

Just to clarify, my point is that when we do write sql most of us start by writing the from part, and even if we didn't I can just offer all columns from all tables I know about with some heuristic for their order when autocompleting in the select part.

> Scrapscript is best understood through a few perspectives: > “it’s JSON with types and functions and hashed references” > “it’s tiny Haskell with extreme syntactic consistency” > “it’s a language with a weird IPFS thing”

frakt0x90 · 2 years ago

I love the idea of PRQL and having a DuckDB extension will make it a lot easier to play with since I'm already using that for my hobby data science projects.

Can you share your use of this? I’m interesting in using duckdb and wanted a boost from people with more experience.

Currently playing with postgress and dbt.

teworks · 2 years ago

I replaced a ton of pandas code with an embedded DuckDB for internal data application and got a massive performance boost and (arguably) cleaner code.

datadrivenangel · 2 years ago

You can replace local postgres with DuckDB and it should be even faster for data analysis.

"Querying DuckDB with PRQL" by Learn Data with Mark (6 min) https://youtu.be/Rdohw424DA4?si=jWydM-LMo2c3ulrh

kthejoker2 · 2 years ago

Disclaimer: I work at Databricks ... so hopefully my subsequent opinion Is worth even more.

PRQL and DuckDB are amazing. I've been building a Power BI clone - think low code ETL, semantic layer, dashboards + chatbot - on top of Databricks SQL with PRQL as transformation engine and DuckDB as caching / aggregation layer. (Also huge shoutout to Toby and sqlglot.)

So easy to generate things programmatically, PRQL is the perfect map to low code GUIs, Arrow in and out, easy to scale ...

Big fan. Every analytics engineer should try them.

dang · 2 years ago

PRQL: Pipelined Relational Query Language - https://news.ycombinator.com/item?id=36866861 - July 2023 (209 comments)

PRQL: a simple, powerful, pipelined SQL replacement - https://news.ycombinator.com/item?id=34181319 - Dec 2022 (215 comments)

Show HN: PRQL 0.2 – a better SQL - https://news.ycombinator.com/item?id=31897430 - June 2022 (159 comments)

PRQL – A proposal for a better SQL - https://news.ycombinator.com/item?id=30060784 - Jan 2022 (292 comments)

didip · 2 years ago

oulipo · 2 years ago

Nice! Perhaps you could distribute a NPM package with the WASM extension too? Otherwise the alternative is to use the WASM PRQL compiler, and use that in the browser to generate queries for ducksb-wasm

carlopi · 2 years ago

Here it is: https://carlopi.github.io/duckdb-wasm-unsigned////#queries=v...

That is, the duckdb-wasm web shell with loaded PSQL extension

amcaskill · 2 years ago

I'm quite excited about this, and would also love to have it distributed as an NPM package.

I work on an OSS web framework for reporting/ decision support applications (https://github.com/evidence-dev/evidence), and we use WASM duckDB as our query engine. Several folks have asked for PRQL support, and this looks like it could be a pretty seamless way to add it.

I added it using prql-js

tacone · 2 years ago

This is probably stupid but... would scrapscript[1] be a good base for a DSL aimed at replacing SQL?

[1]: https://scrapscript.org/

edit: added link

hintymad · 2 years ago

Maybe, but at least it is not immediately clear to me why scrapscript is better than SQL.

The homepage of scrascript says:

It later says: "Scrapscript solves the software sharability problem."

None of the statements addresses the commonly recognized shortcomings of SQLs.

In contrast, PRQL's value proposition is clear: it improves the readability and composability of SQL by offering a linear order of transformation through a pipelined structure. In addition, it is based on relational algebra so database professionals can still apply the theoretical framework that they understand and trust to master the language.

Hello, thank you for your answer. There's apparently more in scrapscript than what's written on its homepage. I didn't really meant to compare it to PRQL, but at the same time it probably offers some of the propositions of PRQL: extreme composability, functional programming (so: pipelines) and some more.

surprisetalk · 2 years ago

Scrapscript author here. I'm a huge fan of PRQL and indeed hope to implement a DSL called "scrounge" later this year. The DSL will be nice to have for codegen and type-safety interop inside of scrapscript, but there's no way it would feel as nice as using PRQL directly.