Pql, a pipelined query language that compiles to SQL

What's the reason to go for this over PRQL?

caust1c · 2 years ago

The simple answer is that it's too distinct from where we're trying to meet our users.

We're not anti-PRQL, but our users (folks in security) coming from Kusto, Splunk, SumoLogic, LogScale, and others have expressed that they have a preference for this syntax over PRQL syntax.

I wouldn't be surprised if we end up supporting both and letting folks choose the one they're most happy with using.

memset · 2 years ago

The big advantage of this project in my opinion is that it will pass functions it doesn’t recognize to the underlying DB.

With prql, if they don’t support your favorite operator then you’re out of luck.

beeskip · 2 years ago

https://prql-lang.org/book/reference/syntax/s-strings.html

IshKebab · 2 years ago

Looks like PRQL doesn't have a Go library so I guess they just really wanted something in Go?

I would guess they didn't wrap the main PRQL library (which is written in Rust) because Go code is a lot easier to deal with when it's pure Go. And they probably didn't just write a Go version of PRQL because that would be a mountain of work.

Still I think that's a mistake. PRQL is a far more mature project and has things like IDE support and an online playground which they are never going to do...

Better just to bite the bullet and wrap the Rust library.

pkolaczk · 2 years ago

It’s not the problem with binding to Rust. Python can do it, Zig can do it, C can do it, even JS can do. It is Go which doesn’t integrate well with anything that isn’t Go.

tstack · 2 years ago

> Looks like PRQL doesn't have a Go library so I guess they just really wanted something in Go?

There's some C bindings and the example in the README shows integration with Go:

https://github.com/PRQL/prql/tree/main/prqlc/bindings/prqlc-...

zhiboz · 2 years ago

I had the exact same question when I saw the post.

We're developing TQL (Tenzir Query Language, "tea-quel") that is very similar to PQL: https://docs.tenzir.com/pipelines

Also a pipeline language, PRQL-inspired, but differing in that (i) TQL supports multiple data types between operators, both unstructured blocks of bytes and structured data frames as Arrow record batches, (ii) TQL is multi-schema, i.e., a single pipeline can have different "tables", as if you're processing semi-structured JSON, and (iii) TQL has support for batch and stream processing, with a light-weight indexed storage layer on top of Parquet/Feather files for historical workloads and a streaming executor. We're in the middle of getting TQL v2 [@] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.

While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.

Comparison:

    users
    | where eventTime > minus(now(), toIntervalDay(1))
    | project user_id, user_email

vs TQL:

    export
    where eventTime > now() - 1d
    select user_id, user_email

[@] https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...

[#] https://docs.tenzir.com/blog/five-design-principles-for-buil...

This is really great! Maybe I'll incorporate this into my own software (scratchdata/scratchdb)

Question: it looks like you wrote the parser by hand. How did you decide that that was the right approach? I myself am new to parsers and am working on implementing the PostgREST syntax in go using PEG to translate to Clickhouse, which is to say, a similar mission as this project. Would love to learn how you approached this problem!

seer · 2 years ago

I also wrote a parser (in typescript) for postgres (https://github.com/ivank/potygen), and it turned out quite the educational experience - Learned _a lot_ about the intricacies of SQL, and how to build parsers in general.

Turned out in webdev there are a lot of instances where you actually want a parser - legacy places where they used to save things in plain text for example, and I started seeing the pattern everywhere.

Where I would have reached for some monstrosity of a regex to solve this, now I just whip out a recursive decent parser and call it a day, takes surprisingly small amount of code! (https://github.com/dmaevsky/rd-parse)

brikym · 2 years ago

It looks a lot like Kusto query language. Here is a kusto query:

    StormEvents
    | where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31)) 
    and DamageCrops > 0
    | summarize EventCount = count() by bin(StartTime, 7d)

edit... yes it indeed was inspired by Kusto as they mention on the github Readme https://github.com/runreveal/pql

nerdponx · 2 years ago

Is that what AWS CloudWatch Insights uses?

njpatel · 2 years ago

We use KQL to for axiom too (well, a version of it) - it's a great query language and very flexible for unstructured + structured data.

theragra · 2 years ago

Azure appinsights logs

darcien · 2 years ago

This is actually pretty awesome! I use KQL every few days for reading some logs from Azure App Insight. The syntax is pretty nice and you can make pretty complex stuff out of it. But that's it, I can't use KQL anywhere else outside Azure. With this, I can show off my KQL-fu to my teammates and surprise them with how fast you can write KQL-like syntax compared to SQL.

crooked-v · 2 years ago

SuaveSteve · 2 years ago

Looks similar to PRQL[0].

Neither PRQL nor Pql seem to be able to do anything outside of SELECT like Preql[1] can.

I propose we call all attempts at transpiling to SQL "quels".

[0] https://prql-lang.org/ [1] https://github.com/erezsh/Preql

mavam · 2 years ago

RedShift1 · 2 years ago

InfluxDB tried to do this with InfluxQL but abandoned it, and are now back to SQL. The biggest problem I had with it when I tried it, was that is was simply too slow, queries were on average 6x slower than their SQL equivalents. I think a language like this is just too hard to optimize well.

loic-sharma · 2 years ago

This is incorrect. It was their query engine that was hard to optimize, not the language. InfluxDB has been working on a new query engine based off Apache DataFusion to fix this.

If you squint, this query language is very similar to Polars, which is state-of-the-art for performance. I expect Pql could be as performant with sufficient investment.

The real problem is that creating a new query language is a ton of work. You need to create tooling, language servers, integrate with notebooks, etc… If you use SQL you get all of this for free.

Reubensson · 2 years ago

I dont think they have abandoned InfluxQL. They are still supporting it in the InfluxDB 3 as far I know. But they are abandoning flux, which was a mess and pain to use.

helloericsf · 2 years ago

Wow, so many query languages, right? Do we really need another one? What's the story behind that decision? Cheers.

brettv2 · 2 years ago

This is answered on their blog:

https://blog.runreveal.com/introducing-pql/

Cheers, mate! The blog cleared up a chunk of my question and the chat here gave me a better grasp of why it's over PRQL.

654wak654 · 2 years ago

Reminds me of this classic: https://xkcd.com/927/