This is really great! Maybe I'll incorporate this into my own software (scratchdata/scratchdb)
Question: it looks like you wrote the parser by hand. How did you decide that that was the right approach? I myself am new to parsers and am working on implementing the PostgREST syntax in go using PEG to translate to Clickhouse, which is to say, a similar mission as this project. Would love to learn how you approached this problem!
I also wrote a parser (in typescript) for postgres (https://github.com/ivank/potygen), and it turned out quite the educational experience - Learned _a lot_ about the intricacies of SQL, and how to build parsers in general.
Turned out in webdev there are a lot of instances where you actually want a parser - legacy places where they used to save things in plain text for example, and I started seeing the pattern everywhere.
Where I would have reached for some monstrosity of a regex to solve this, now I just whip out a recursive decent parser and call it a day, takes surprisingly small amount of code! (https://github.com/dmaevsky/rd-parse)
It looks a lot like Kusto query language. Here is a kusto query:
StormEvents
| where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31))
and DamageCrops > 0
| summarize EventCount = count() by bin(StartTime, 7d)
This is actually pretty awesome! I use KQL every few days for reading some logs from Azure App Insight. The syntax is pretty nice and you can make pretty complex stuff out of it. But that's it, I can't use KQL anywhere else outside Azure. With this, I can show off my KQL-fu to my teammates and surprise them with how fast you can write KQL-like syntax compared to SQL.
The simple answer is that it's too distinct from where we're trying to meet our users.
We're not anti-PRQL, but our users (folks in security) coming from Kusto, Splunk, SumoLogic, LogScale, and others have expressed that they have a preference for this syntax over PRQL syntax.
I wouldn't be surprised if we end up supporting both and letting folks choose the one they're most happy with using.
Looks like PRQL doesn't have a Go library so I guess they just really wanted something in Go?
I would guess they didn't wrap the main PRQL library (which is written in Rust) because Go code is a lot easier to deal with when it's pure Go. And they probably didn't just write a Go version of PRQL because that would be a mountain of work.
Still I think that's a mistake. PRQL is a far more mature project and has things like IDE support and an online playground which they are never going to do...
Better just to bite the bullet and wrap the Rust library.
It’s not the problem with binding to Rust. Python can do it, Zig can do it, C can do it, even JS can do. It is Go which doesn’t integrate well with anything that isn’t Go.
Also a pipeline language, PRQL-inspired, but differing in that (i) TQL supports multiple data types between operators, both unstructured blocks of bytes and structured data frames as Arrow record batches, (ii) TQL is multi-schema, i.e., a single pipeline can have different "tables", as if you're processing semi-structured JSON, and (iii) TQL has support for batch and stream processing, with a light-weight indexed storage layer on top of Parquet/Feather files for historical workloads and a streaming executor.
We're in the middle of getting TQL v2 [@] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.
While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.
InfluxDB tried to do this with InfluxQL but abandoned it, and are now back to SQL. The biggest problem I had with it when I tried it, was that is was simply too slow, queries were on average 6x slower than their SQL equivalents. I think a language like this is just too hard to optimize well.
This is incorrect. It was their query engine that was hard to optimize, not the language. InfluxDB has been working on a new query engine based off Apache DataFusion to fix this.
If you squint, this query language is very similar to Polars, which is state-of-the-art for performance. I expect Pql could be as performant with sufficient investment.
The real problem is that creating a new query language is a ton of work. You need to create tooling, language servers, integrate with notebooks, etc… If you use SQL you get all of this for free.
I dont think they have abandoned InfluxQL. They are still supporting it in the InfluxDB 3 as far I know. But they are abandoning flux, which was a mess and pain to use.
Question: it looks like you wrote the parser by hand. How did you decide that that was the right approach? I myself am new to parsers and am working on implementing the PostgREST syntax in go using PEG to translate to Clickhouse, which is to say, a similar mission as this project. Would love to learn how you approached this problem!
Turned out in webdev there are a lot of instances where you actually want a parser - legacy places where they used to save things in plain text for example, and I started seeing the pattern everywhere.
Where I would have reached for some monstrosity of a regex to solve this, now I just whip out a recursive decent parser and call it a day, takes surprisingly small amount of code! (https://github.com/dmaevsky/rd-parse)
We're not anti-PRQL, but our users (folks in security) coming from Kusto, Splunk, SumoLogic, LogScale, and others have expressed that they have a preference for this syntax over PRQL syntax.
I wouldn't be surprised if we end up supporting both and letting folks choose the one they're most happy with using.
With prql, if they don’t support your favorite operator then you’re out of luck.
I would guess they didn't wrap the main PRQL library (which is written in Rust) because Go code is a lot easier to deal with when it's pure Go. And they probably didn't just write a Go version of PRQL because that would be a mountain of work.
Still I think that's a mistake. PRQL is a far more mature project and has things like IDE support and an online playground which they are never going to do...
Better just to bite the bullet and wrap the Rust library.
There's some C bindings and the example in the README shows integration with Go:
https://github.com/PRQL/prql/tree/main/prqlc/bindings/prqlc-...
Neither PRQL nor Pql seem to be able to do anything outside of SELECT like Preql[1] can.
I propose we call all attempts at transpiling to SQL "quels".
[0] https://prql-lang.org/ [1] https://github.com/erezsh/Preql
Also a pipeline language, PRQL-inspired, but differing in that (i) TQL supports multiple data types between operators, both unstructured blocks of bytes and structured data frames as Arrow record batches, (ii) TQL is multi-schema, i.e., a single pipeline can have different "tables", as if you're processing semi-structured JSON, and (iii) TQL has support for batch and stream processing, with a light-weight indexed storage layer on top of Parquet/Feather files for historical workloads and a streaming executor. We're in the middle of getting TQL v2 [@] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.
While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.
Comparison:
vs TQL: [@] https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...[#] https://docs.tenzir.com/blog/five-design-principles-for-buil...
If you squint, this query language is very similar to Polars, which is state-of-the-art for performance. I expect Pql could be as performant with sufficient investment.
The real problem is that creating a new query language is a ton of work. You need to create tooling, language servers, integrate with notebooks, etc… If you use SQL you get all of this for free.
https://blog.runreveal.com/introducing-pql/