Dasel: Select, put and delete data from JSON, TOML, YAML, XML and CSV

I like the idea of using select/put/delete (sql-style syntax) to query non-rdb data storage. It sort of raises the question of, could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.

Or put another way, is there any data storage format that couldn’t be queried by SQL?

acjohnson55 · a year ago

That's basically SQL. Many SQL systems have lots of built in connectivity to various data sources.

DuckDB is a good example of a (literally) serverless SQL-based tool for data processing. It is designed to be able to treat the common data serialization formats as though they are tables in a schema [1], and you can export to many of the same formats. With extensions, you can also connect to relational databases as foreign tables.

This connectivity is a big reason it has built a pretty avid following in the data science world.

[1] https://duckdb.org/docs/data/overview

[2] https://duckdb.org/docs/extensions/json#json-importexport

[3] https://duckdb.org/docs/extensions/postgres

TeMPOraL · a year ago

> Or put another way, is there any data storage format that couldn’t be queried by SQL?

Is your SQL Turing-complete? If yes, then it could query anything. Whether or not you'd like the experience is another thing.

Queries are programs. Querying data from a fixed schema, is easy. Hell, you could make an "universal query language" by just concatenating together this dasel, with SQL and Cypher, so you'd use the relevant facet when querying a specific data source. The real problem starts when your query structure isn't fixed - where what data you need depends on what the data says. When you're dealing with indirection. Once you start doing joins or conditionals or `foo[bar['baz']] if bar.hasProperty('baz') else 42` kind of indirection, you quickly land in the Turing tarpit[0] - whatever your query language is, some shapes of data will be super painful for it to deal with. Painful, but still possible.

[0] - https://en.wikipedia.org/wiki/Turing_tarpit

gumby · a year ago

> It sort of raises the question of, could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.

Sure there could be -- any turing-complete language (which SQL is) can query anything.

But the reason we have different programming languages* is because they have different affordances and make it easy to express certain things at the cost of being less convenient for other things. Thus APL/Prolog/Lisp/C/Python can all coexist.

SQL is great for relational databases, but it's like commuting to work in a tank when it comes to key-value stores.

* and of course because programmers love building tools, and a language is the ultimate tool.

sweeter · a year ago

sounds like a nightmare to do logistically. it would be cool though.

breck · a year ago

> could there be 1 universal language to query relational databases, text file storage (json, csv, etc), and anything else.

> Or put another way, is there any data storage format that couldn’t be queried by SQL?

We created PLDB.io (a Programming Language DataBase) and have studied nearly every language ever created and thought about this question a lot.

Yes, there could be 1 language to query everything, but there will always be a better DSL more relevant for particular kinds of data than others. It's sort of like how with a magnifying glass you can magnify anything, but if you want to look at bacteria you're going to want a microscope (and you wouldn't want a microscope to study an elephant).

Now it may turn out that there is 1 universal syntax that works best for everything (I'm sure people can guess what I would say), but I can't think of a case where you wouldn't want to have a DSL with semantics evolved to match a particular domain.

Derelicte · a year ago

There are a lot of differences between storage formats. It would be incredibly difficult to create a universal query language. It would need to either a) change the storage formats so much that they're not really following their original standard, or b) create so many different versions of the query language that it's not really one standard.

Off the top of my head, SQL can't do lists as values, and doesn't have simple key-value storage. Json doesn't have tables, or primary keys / foreign keys, and can have nested data

esprehn · a year ago

SQL has both standard JSON and Array functions. What's the "list as value" feature you think is missing?

slightwinder · a year ago

> Or put another way, is there any data storage format that couldn’t be queried by SQL?

Depends on how keen you are on pure SQL. For example, postgres and sqlite have json-extensions, but they also enhance the syntax for it. Simliar can be done for all other formats too, but this means you need to learn special syntax and be aware of the storage-format for every query. This is far off from a real universal language.

IgorPartola · a year ago

From what I understand SQL is or at least can be made Turing complete so in that sense you should be able to query any data store using it. However, that doesn’t mean it will be efficient to do so.

I suspect for most data structures you could construct an index to make querying faster. But think about querying something like a linked list: it is not going to be too efficient without an index but you should still be able to write an engine that will do so.

If you have something like a collection of arbitrary JSON objects without a set structure you should still be able to express what you are trying to do with SQL because Turing completeness means it can examine the object structure as well as contents before deciding what to do with it. But your SQL would look more like procedural code than you might be used to.

wslh · a year ago

> It sort of raises the question of, could there be 1 universal language to query relational databases...

Even if SQL and/or another query language could be Turing-complete, that doesn't mean that you can have 1 universal language to perform all possible queries in an efficient way. In basic computer science terms that means that your data structure is linked with the queries, and efficiency you want to achieve, and ad-hoc changes should be created for specific problems.

robmccoll · a year ago

Tree and graph structures can be queried using SQL (with more or less difficulty depending on how you have chosen to encode and index them), but it's not a particularly simple and straightforward language to use for such a task.

robto · a year ago

All data is a graph, and so graph query languages work well for this. SPARQL is my tool of choice but cypher and maybe GQL (though that's new).

0thgen · a year ago

interesting! that's a mathematically provable statement? or do you mean, most data can be represented as a graph? (I'm not asking antagonistically; I'm genuinely interested in the statement you made)

7952 · a year ago

Perhaps the limit is not around formats but around the type system. You may be able to dump data, but can you actually reliably use it for anything?

username135 · a year ago

SAS is good at reading pretty much anything.

Perz1val · a year ago

XML attributes come to mind

lagniappe · a year ago

Perz1val, it's me, your grandchild from the distant future. Don't do this. XML goes rogue and destroys humanity.

ablob · a year ago

If entries can be relations themselves it is not possible afaik. For example

  User | Telephone Numbers
  -----+------------------
  A    | 123, 456           <- not atomic; more than 1 number (i.e. a set)
  B    | 789

Now there are academic operators to convert to and from a purely relational system, but I don't think they are implemented/in the standard. I forgot what they are called, however.

In general you don't want a universal query language. Depending on the shape of the data you want different things to be easily expressible. You can, for example express queries on tree-shaped data with SQL (see xPath-Accelerator), but it is quite cumbersome and its meaning is lost to the reader. I.e.: It's fine when computer-generated, but there is too much noise for a human to read/write themselves. I'd be glad to be proven wrong here, but as time has shown, there is no one size fits all for programming languages. The requirements for different applications just vary too much.

Neat; seems about every quarter or so one of these types of tools is highlighted here.

Awaiting all the responses from people to show off or list what tool they've landed on to support their specific use cases; I always learn a lot from these.

hnlmorg · a year ago

Personally I think this is a problem better spent by fixing the shell. There’s a few alt shells out there now, Nushell, Elvish plus the one I help maintain, Murex (https://murex.rocks).

I’m obviously going to biased here, but it’s definitely worth your time checking out some alt shells.

pmarreck · a year ago

I did not know about Murex. This actually looks pretty cool! Or at least a definite improvement. How is it regarding using it as a scripting language in lieu of Bash?

The last shell I was intrigued by was es-shell which despite being old is still being updated and uses functional semantics while still looking like a shell language. I had chatgpt generate a comparison of all these with Bash (take with a grain of salt, I already had to correct at least one thing):

https://gist.github.com/pmarreck/b7bd1c270cb77005205bf91f80c...

digdugdirk · a year ago

I'm a bit confused as to the use case. Is it just a way to interact with json/yaml style documents as if they were a structured database, but from the command line? Kind of an in-between for those moments you don't want to write a quick script to batch modify files?

It looks really well done, I think I'm just failing to see how this is more beneficial than just opening a single file in the editor and making changes, or writing a quick functional script so you have the history of the changes that were made to a batch of files.

If someone could explain how I could (and why I should) add a new tool to my digital toolbelt, I'd greatly appreciate it.

simonw · a year ago

I use jq for this kind of thing several times a week. It’s great for piped data - things like running curl to fetch JSON, then piping it though to reformat it in different ways.

Here’s a jq expression I used recently to turn a complete GitHub Issues thread into a single Markdown document:

    curl -s "https://api.github.com/repos/simonw/shot-scraper/issues/1/comments" \
      | jq -r '.[] | "## Comment by \(.user.login) on \(.created_at)\n\n\(.body)\n"'

I use this pattern a lot. Data often comes in slightly the wrong shape - being able to fix that with a one-liner terminal command is really useful.

Natfan · a year ago

PowerShell[0]'s built-in Microsoft.PowerShell.Utility[1] module has commands to `Convert-From...` or `Convert-To...` JSON, CSV, a version of XML (CliXML), or custom key/value pairs (StringData) into objects, which can then be manipulated. Combined with IO cmdlets from the built-in module Microsoft.PowerShell.Management[2] such as `Get-Content` and `Set-Content`, a fair chunk of flat-file storage should be able to be made mutable with PowerShell.

---

[0]: https://microsoft.com/powershell

[1]: https://learn.microsoft.com/powershell/module/microsoft.powe...

[2]: https://learn.microsoft.com/powershell/module/microsoft.powe...

tofflos · a year ago

I used yq last week to scan through all the Java projects (i.e. Maven pom.xml-files) within our org to check which ones inherit from the corporate pom.

  yq eval --input-format xml --output-format csv '[file_index, file_name, .project.parent.groupId, .project.parent.artifactId, .project.parent.version]' **/pom.xml

supriyo-biswas · a year ago

For things that are mostly shell scripts and things in a similar family (Ansible playbooks, deployment pipelines etc.) and where you need to modify a structured file quickly, it's usually much faster to use the DSL provided by the tool than calling out to various scripts to extract or modify a single JSON key.

People often say that they'd prefer to write their shell scripts in Python or even Go these days, but the problem there is that the elements of structured programming makes the overall steps difficult to follow. Typically, the paradigm with use cases adjacent with shell scripts is to be able to view what it is doing without any sort of abstractions.

Lord_Zero · a year ago

This could be useful for CICD where you need to bump a version number in a file based on the build number.

fsckboy · a year ago

the in-between mode that you mention but seem to dismiss it is the way most traditional unixheads work with data most of the time: from the command line

editor? when i pull up emacs, 50% of the time it's write emacs macros, and I do that because shell scripts don't easily go backward in the stream. (something rarely mentioned about teco was that it was a stream editor that would chew its way forward through files; you didn't need the memory to keep it all in core, and it could go backward within understandable limits)

writing an actual shellscript is only for when it's really hairy, you are going to be repeating it and/or you need the types of error handling that cloud up the clarity of the commandline

the commandline does provide rudimentary "records" in the saved history

macNchz · a year ago

I see the appeal of having a declarative syntax rather than writing a bunch of code to make the change reliably and safely.

0thgen · a year ago

one benefit (idk if it applies here) is if the select/put/delete statements didn’t require loading the data in memory; so you could query massive data files with limited RAM and not have to solve that problem yourself for each data storage format you’re working with

paulddraper · a year ago

> or writing a quick functional script

It's exactly a quick functional script.