Readit News logoReadit News
grose · 3 years ago
Datalog is great for representing authorization rules. Check out Biscuits, which are auth tokens with Datalog embedded in them. This article is what made it 'click' for me: https://www.clever-cloud.com/blog/engineering/2021/04/15/bis...

I actually thought that Datalog was so cool that I went to learn Prolog and it completely changed the way I think about programming. Highly recommend trying out logic programming if you haven't before.

burakemir · 3 years ago
Agree but also want to point out that people usually have a narrow view on "logic programming". Datalog can also be understood with out the top-down evaluation / resolution that is typically associated with prolog, which is why it is known to database researchers and in finite model theory. Prolog is great, but bottom up techniques to evaluate datalog are awesome, too, and would arguably also qualify as logic programming. It is rare to see this acknowledged.
YeGoblynQueenne · 3 years ago
Acknowledged, by whom? As far as I know people in the logic programming community have no trouble recognising datalog as a logic programming language, be it evaluated bottom-up or not (you can still evaluate a datalog program top-down, by resolution, as if it were a Prolog program without "compound" terms) (a.k.a. functions).

Indeed, I get the feeling that the primacy of Prolog as the logic programming language has waned. If you look at back issues of ICLP* proceedings you'll find plenty of work that is nothing to do with resolution- namely, Answer Set Programming, which is wildly popular.

I tend to think it's mainly the database community that kind of ignores the logic-programming nature of datalog.

Oh and btw, when I talk about datalog, I mean definite clauses without functions, not the aberrant syntax in the article above. I'll never understand why people do that.

________________

* ICLP is the International Conference on Logic Programming.

wslh · 3 years ago
In the last few months the mention of Datalog has increased, I wondered how it differed from graph databases and found a clear answer in SO [1]. I am not an incumbent but found graph databases and clause approaches interesting.

[1] https://stackoverflow.com/questions/29192927/a-graph-db-vs-a... (2015)

refset · 3 years ago
XTDB, which is mentioned in the post, is subtly different from the other Clojure-based Datalog systems in this respect, because its Datalog engine executes in terms of multi-way joins using a "Worst-Case Optimal Join" implementation that is ideal for graph processing (vs. a tree of binary hash joins). Therefore, based on statistics and query planning heuristics, it will often perform graph pattern matching before resolving the logic/horn clauses. (source: I work on the XTDB team)
eternalban · 3 years ago
Interesting architecture:

https://raw.githubusercontent.com/xtdb/xtdb/master/docs/conc...

Btw, is that 'RocksDB or ?' for the local store current or other storage engines can get plugged in?

p.s. this is datomic's architecture for comparison.

https://docs.datomic.com/on-prem/images/clientarch_orig.svg

felixyz · 3 years ago
I did an interview [1] with Kevin Feeney, one of the founders (no longer active) of TerminusDb, which goes into some depth about the difference between RDF stores and (property) graph databases, where the former is more closely aligned with datalog and logic programming. There are links to a really excellent series of blog posts by Kevin on this topic in the show notes.

[1] https://thesearch.space/episodes/5-kevin-feeney-on-terminusd...

forks · 3 years ago
I love The Search Space. Waiting patiently for new episodes!
westurner · 3 years ago
With RDF* and SPARQL* ("RDF-star" and "SPARQL-star") how are triple (or quad) stores still distinct from property graphs?

RDFS and SHACL (and OWL) are optional in a triple store, which expects the subject and predicate to be string URIs, and there is an object datatype and optional language:

  (?s ?p ?o <datatype> [lang])

  (?subject:URI, ?predicate:URI, ?object:datatype, object_datatype, [object_language])
RDFS introduces rdfs:domain and rdfs:range type restrictions for Properties, and rdfs:Class and rdfs:subClassOf.

`a` means `rdf:type`; which does not require RDFS:

  ("#xyz", a,        "https://schema.org/Thing")
  ("#xyz", rdf:type, "https://schema.org/Thing")
Quad stores have a graph_id string URI "?g" for Named Graphs:

  (?g ?s ?p ?o)

  ("https://example.org/ns/graphs/0", "#xyz", a, "https://schema.org/Thing")

  ("https://example.org/ns/graphs/1", "#xyz", a, "https://schema.org/ScholarlyArticle")
There's a W3C CG (Community Group) revising very many of the W3C Linked Data specs to support RDF-star: https://www.w3.org/groups/wg/rdf-star

Looks like they ended up needing to update basically most of the current specs: https://www.w3.org/groups/wg/rdf-star/tools

"RDF-star and SPARQL-star" (Draft Community Group Report; 08 December 2022) https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

GH topics: rdf-star, rdfstar: https://github.com/topics/rdf-star, https://github.com/topics/rdfstar

pyDatalog does datalog with SQLAlchemy and e.g. just the SQLite database: https://github.com/pcarbonn/pyDatalog ; and it is apparently superseded by IDP-Z3: https://gitlab.com/krr/IDP-Z3/

From https://twitter.com/westurner/status/1000516851984723968 :

> A feature comparison of SQL w/ EAV, SPARQL/SPARUL, [SPARQL12 SPARQL-star, [T-SPARQL, SPARQLMT,]], Cypher, Gremlin, GraphQL, and Datalog would be a useful resource for evaluating graph query languages.

> I'd probably use unstructured text search to identify the relevant resources first.

noduerme · 3 years ago
That's a really neat example of something I'm not familiar with. Going up a tree from child to parent is often the heaviest part of dealing with regular datasets, and usually requires a mix of queries and application logic. The idea of flattening the data along some pattern like that is of course always possible in a relational db, but it's not usually efficient, especially not for heavy writing. Lateral joins and window partitions can help. But this seems like an interesting approach to removing the app code completely.
flyingsilverfin · 3 years ago
I work on TypeDB (https://vaticle.com/typedb), and it sits somewhere at this intersection. The exposed query language has elements of both logic programming constructs and graph-like structures. Both amount to a kind of "constraint" programming.
rapnie · 3 years ago
A quick peek shows it seems along similar lines as TerminusDB sorta kinda, but they have WOQL [0]. At this time I start to worry again about all the different kinds and flavours of query languages that are emerging.

[0] https://en.wikipedia.org/wiki/TerminusDB#Query_language

felixyz · 3 years ago
I really like TypeDB! Haven't been able to use it for anything serious yet, but have a couple of project brewing where it might fit :)
cmrdporcupine · 3 years ago
You might be interested in https://relational.ai/

Treats graph edges as binary relations ("graph normal form"), has a Datalog-ish language. Built for managing large interconnected knowledge sets in a declarative way.

I recommend this talk: https://www.youtube.com/watch?v=WRHy7M30mM4

felixyz · 3 years ago
Great project, not open source alas. This is another great talk about RelationalAI (and its precursor), highlighting how using powerful databases can simplify complex applications: https://www.hytradboi.com/2022/experience-report-building-en...
muattiyah · 3 years ago
ICYMI, there's an excellent interactive introduction to `datalog` that's referenced in the article's references.[0]

Last time I used `datalog` was years ago, I was developing an internal interactive tool that was used to compare different approaches to solving a certain problem at my employer. I used `datascript`[1] by way of clojurescript to store all experiment data and then interrogated the `datascript` DB via `datalog`. This is something I always remember fondly.

[0] https://www.learndatalogtoday.org/ [1] https://github.com/tonsky/datascript

pavlov · 3 years ago
As mentioned in the article, Datomic is a database that uses Datalog as its query language:

https://docs.datomic.com/on-prem/query/query.html#why-datalo...

(Some ten years ago worked at a startup that used Datomic. It seemed to work great, although the only queries I ever needed to add to the system were simple copy-paste hacks of existing ones, so I never got to dive into Datalog.)

dmitriid · 3 years ago
Datascript is the open source analog for Clojure, ClojureScript and JS: https://github.com/tonsky/datascript
simongray · 3 years ago
There are many open source alternatives in Clojure using this query language: https://github.com/simongray/clojure-graph-resources#datalog
mpenet · 3 years ago
'ish.

datahike would be the closest to datomic in terms of features/implementation (support for as-of, transactor etc).

Then in terms of maturity I think the choice is between xtdb and datascript, both are very solid/maintained but they are also vastly different.

samuell · 3 years ago
A bit related, just stumbled upon Flix, a functional JVM language with Datalog contraints and (somewhat?) Go-like concurrency:

https://flix.dev

HN Thread from 8 months ago: https://news.ycombinator.com/item?id=31448889

refset · 3 years ago
Flix definitely looks interesting! For comparison, I ported the "Datalog Enriched with Lattice Semantics" example from that homepage to XTDB's (Clojure) Datalog after I saw it posted on HN originally: https://gist.github.com/refset/21b3fc1dec9a6928943073809e133...
ianpurton · 3 years ago
So I struggled with this.

I guess the intention is to be better than SQL but then I was left with "under which circumstances?".

With that question in mind I didn't feel the article addressed the issue.

The author might do better to think in terms of "what burning problem are we trying to fix and how did we fix it".

ekidd · 3 years ago
> I guess the intention is to be better than SQL but then I was left with "under which circumstances?"

Excellent question.

Two of the most common use cases for databases are "transactional processing" (manipulating small numbers of rows in real time) and "analytical processing" (querying enormous numbers of rows, typically in a read-only fashion).

SQL is generally fine for transactional workloads.

But analytical queries sometimes involve multi-page queries, with lots of JOINs and CTEs. And these queries are often automatically generated.

And once you start writing actual multi-page "programs" in SQL, you may decide that it's a fairly clunky and miserable programming language. What Datalog typically buys you is a way to cleanly decompose large queries into "subroutines." And it offers a simpler syntax for many kinds of complex JOINs.

Unfortunately, there isn't really a standard dialect of Datalog, or even a particular dialect with mainstream traction. So choosing Datalog is a bit of a tradeoff: does it buy you enough, for your use case, that it's worth being a bit outside the mainstream? Maybe! But I'd love to see something like Logica gain more traction: https://logica.dev/

sirwhinesalot · 3 years ago
That's a good point, to be honest I'm not sure there's a particularly good answer to that beyond "it's a much better underlying model". SQL is an ugly bastardization of the relational model, whereas Datalog is a clean application of the logical model to the relational setting.

One thing that's really nice about Datalog is that it puts the focus on modelling relations between data instead of "tables" which often end up with a jumble of data and then need to be normalized to work properly. It pushes you into good structures by default, which is a really good property to have. It's also much much simpler than SQL, no need to think in terms of inner joins and outer joins and whatever else, it is all relations.

It also easily produces derived data from existing data, without needing any kind of procedural process, and is highly composable.

It's not so much that it's solving a burning problem than SQL, it's just better than SQL...

(note: the clojure syntax used in that page is much more confusing than the native Datalog syntax IMO)

zh217 · 3 years ago
Another problem with Clojure-based Datalog is performance. Yet another is you are pretty much tied to the Clojure ecosystem. And I don't really like the Clojure-fused Datalog syntax either. These pains spurred me to write my own: https://docs.cozodb.org/en/latest/ (FOSS), and so far I'm satisified with my own work.
crabbone · 3 years ago
I'm not the author, and I don't know how author would answer this question, but some things are really on the surface:

1. Modularity. In Datalog it's very easy to name a common (sub-)query and reuse it in multiple queries. You can have stored procedures / functions in SQL, but the syntax is very different implementation to implementation and users are typically afraid of using the feature (the SQL/PSM does not go far enough to define what users would normally want). Also, SQL/PSM is an imperative extension of otherwise declarative language. The two just don't work well together.

2. Structures other than primitives and tables (you get lists, vectors, hash-maps and sets, but you can also make your own). A lot of practical SQL extensions also offer some of these structures, but they aren't in the standard.

3. Operations on primitive types fall into the same category / work in the same way as operations on complex types. I.e. select / update / insert / delete in SQL only apply to tables, but strings or numbers don't work in the same way. It's more uniform in Datalog (not 100%, but still better).

conor-23 · 3 years ago
A researchy perspective: Datalog was invented to extend relational algebra with recursion. Since it started out as an academic tool, people have been studying recursion-specific optimizations you can do for decades so it is extremely well suited to recursive use-cases e.g. iterative graph algorithms. Using Datalog for network algorithms won the thesis award in databases almost 20 years ago https://boonloo.cis.upenn.edu/papers/boon_interview.pdf .
tejtm · 3 years ago
This is the answer I subscribe to. CTEs and recursive CTEs are SQL's answer to a limitation of plain relational algebra; no loops.

CTEs are a great and most welcome addition to SQL but they are a bolt-on patch as compared with Datalog where it is a core feature.

felixyz · 3 years ago
> Datalog was invented to extend relational algebra with recursion.

I'm not sure that is exactly right. Do you have a reference? (Not trying to put you on the spot, I'm just curious to learn the history!)

mmcdermott · 3 years ago
> I guess the intention is to be better than SQL but then I was left with "under which circumstances?".

The way I see it, SQL's great strength as a query language is selection and projection whereas Datalog's great strength is inference.

It takes a shift in mindset to take advantage of inference. I've started using Datalog when analyzing unfamiliar codebases. So I might set up something like:

    writes_to(microservice1, db1).
    writes_to(microservice2, microservice1).
    writes_to(microservice3, microservice2).

    depends_on(X, Y) :- writes_to(X, Z), depends_on(Z, Y).
    depends_on(X, Y) :- writes_to(X, Y).
Allowing you to query something like:

    depends_on(microservice3, X)
and get back db1, microservice1 and microservice2. Of course, you can ultimately do this in SQL as well, but it's far more compact in Datalog.

Which brings me to the second half of the motivation - the principal advantage to the logical database model is the open world assumption. The relational model in practice tends to be fairly buttoned down - data is expected to suit a particular schema (which has its own advantages as a transactional system). This makes it easy to extend a logical database and ask more questions without changing the fact formats already specified.

Of course, I can ultimately do all the things that feel natural in Datalog in SQL. I can work through queries like the one above by building out the data model and writing recursive CTEs. It's about strengths, not possibilities which I suppose brings me back to why SQL 'won'. A disproportionate amount of code is written for transactional line-of-business systems. It's really not hard to see why the validation layer and transactional focus would win in those areas.

yaantc · 3 years ago
In 2021 Google introduced Logica, a Datalog variant. In their introduction blog they contrast it with SQL, so this may answer your question:

https://opensource.googleblog.com/2021/04/logica-organizing-...

In short, it's easier to compose (and decompose), which helps with complex queries that can be assembled from simpler, independently tested parts.

No personal experience in this, I just remembered that blog contrasting Datalog and SQL.

dgb23 · 3 years ago
One of the biggest advantages is re-use, extensibility and being robust to change, because you define things bottom up. Think of aggregates as "hashmaps with typed keys" rather than "structs with places in them".

Aggregates are implicit groupings, not explicit slots. This wards you from risk of change and speeds up your feedback loop of making changes and experimenting with ideas. Plus you can compose new aggregate variants out of the same primitives (attributes) which is a fairly typical use-case.

Another is query composition. I'm more familiar with SQL and I'm somewhat comfortable with writing nested queries. But we all know how both query builders and similar are often hard to re-use and compose. With these datalog dialects, composition, refactoring and extraction of logical units is much more straight forward, even as a beginner. Queries are "flat", and consist clauses that can be moved around and composed. Think of having more associative and commutative operations.

On top of these, the mentioned technologies (in the article) are temporal and support time travel out of the box. That's something you can do on top of SQL of course but it's quite powerful to have it in-built as a primary feature. Whenever you have an application that does things like (non-exclusive), reporting, undo, revisions etc. You might want that.

anon291 · 3 years ago
Sql doesn't compose at all. You can't take two sql programs and combine them with a single operator and get a new program.

In datalog you can. Either by using the comma or semicolon operator.

crabbone · 3 years ago
You are thinking about the declarative part. SQL/STP, the standard that's the basis for definition of the imperative part of SQL (stored procedures) can compose in a similar way (eg. you can call a stored procedure from a stored procedure), but the standard doesn't go far enough, and so every practical database has its own extensions which deters people from using stored procedures because they'd end up with non-portable code.
felixyz · 3 years ago
> then I was left with "under which circumstances?"

Foremost, Datalog has great compositionality, which is one of SQL:s big weaknesses (it was designed with completely different goals). Also, things like recursive queries are completely natural in Datalog, whereas in SQL they are bolted on with recursive common table expressions.

refset · 3 years ago
> Datalog has great compositionality, which is one of SQL's big weaknesses

There's a good write-up on this aspect (amongst others) here: https://www.scattered-thoughts.net/writing/against-sql/

> recursive queries are completely natural in Datalog, whereas in SQL they are bolted on with recursive common table expressions

And similarly: https://github.com/frankmcsherry/blog/blob/master/posts/2022...

YeGoblynQueenne · 3 years ago
>> The author might do better to think in terms of "what burning problem are we trying to fix and how did we fix it".

I have no idea who first came up with the name "datalog", and what exactly was their motivation (I've looked, but couldn't find the original reference to datalog), but the burning problem that datalog does fix is Prolog's semi-decidability, or in other words, its tendency to enter infinite recursions.

One half of the reason for that is Prolog's "compound terms" (known as functions, in First-Order Logic terminology) and the other half is Prolog's use of depth-first search for evaluation. Functions, along with logic variables, mean that a predicate's Herbrand base (its set of logical atoms) can be expanded forever: if f(x) is a term, f(f(x)) is a term, f(f(f(x))) is a term... and so on, ad infinitum. The depth-first search evaluation just gets stuck in left-recursions when the first body literal of a clause is a recursive literal: p(x):- p(x), q(x) loops forever, in Prolog with depth-first search.

Datalog is Prolog without functions other than constants, and it can be evaluated "bottom up", both of which overcome Prolog's semi-decidability (but eliminate its completeness).

But of course this is not a very clear motivation for its use as a database language, only its use as an alternative to Prolog.

More to the point, there are many datalog variants. Some that accept negation, some that do not, some that allow recursion, some that do not, and so on. There's a lot of information on different datalogs, seen from the point of view of database research in this free ebook on databases:

http://webdam.inria.fr/Alice/

See chapters 12 - 15.

And see these lecture notes for a more logic-programm-y discussion of fixpoint semantics of definite logic programs:

https://www.doc.ic.ac.uk/~mjs/teaching/KnowledgeRep491/Fixpo...

falsissime · 3 years ago
> but the burning problem that datalog does fix is Prolog's semi-decidability, or in other words, its tendency to enter infinite recursions.

Prolog's built-in search is not semidecidable. https://en.wikipedia.org/wiki/Decidability_(logic)#Semidecid...

As you observe, Prolog gets stuck in left-recursions. But iterative deepening for Prolog is semidecidable (among other fair methods). This indeed is restricted to the pure, monotonic subset of (full) Prolog together with all pure, monotonic extensions.

bambataa · 3 years ago
I had the exact same reaction.

I did a bit of reading about Datalog and I get that it’s useful in things like static analysis where you get the fixed point analysis for free, but apart from that I am not really sure why I’d use it.

maweki · 3 years ago
The main reasons I'd say are the very intuitive JOINs and queries. Like you give some partial relation and you get every "fit". So as long as you mostly do equalities in JOIN and WHERE, the queries will be very intuitive and obvious to the layperson.

If you work with VIEWs or CTEs a lot, then Datalog is your friend, as every derived relationship is just a VIEW. If you like to encapsulate and reuse queries, then Datalog is ideal. We had plenty discussions on HN about SQL code reuse. Not an issue in Datalog.

And lastly, there are many ways to compile Datalog programs to "executables" that take maybe a few CSV-files as input and give you the results. I'm not saying that this would be always faster than loading the CSVs into SQLite and running SQL against the data, but a lot of work has been done on Datalog query optimization and compilers can emit very efficient code.

z5h · 3 years ago
I’ve been using Prolog a bunch recently, and also embedded and extended MicroKanren in a project. Something I came to appreciate was that Prolog’s depth-first search, and Kanren’s lazy stream approach are good with memory even when generating/searching through infinite solutions. It is my understanding that Datalog, on the other hand, will iteratively expand a set of data. Isn’t this a problem?
YeGoblynQueenne · 3 years ago
"Iteratively expand a set of data"? I'm not sure what you mean here. I think you are probably talking about the "bottom up" evaluation strategy of datalog, right? That's where datalog is evaluated by a so-called TP-operator, which derives the set of all logical consequences a datalog program by calculating its least fixed point. That's the same as the Least Herbrand Model (LHM) of the program, or, in other words, the set of atoms entailed by the program (atoms in the logical sense, of atomic formulae, not in the Prolog sense of constants).

That's the same thing that Prolog does, calculate the LHM of a logic program, but the difference is that datalog programs have finite LHMs, because they don't have functions that can be self-instantiated for ever ( f(x), f(f(x)), f(f(f(x))), ... ) and the bottom-up evaluation, that goes from the "body" to the "head" of a clause, avoids infinite left-recursions.

Prolog, evaluated top-down (clauses are "picked apart" head-first) can get stuck in infinite left-recursions, so datalog's finiteness, and its decidability under TP, is a big gain in efficiency, as anyone who has had to kill a Prolog console session because of an infinite loop will know.

Also, it is not widely recognised but I am the author of the dumbest and most inefficient TP Operator implementation in existence. Obviously I hang my head in shame and will not link to my code. I understand however that there are optimisations that one can perform that make bottom-up execution efficient, and even quite fast. Unfortunately, I don't know what they are :P

Note that Prolog can also be evaluated without fear of left-recursions, by SLG-Resolution (a.k.a. "tabling", a.k.a. memoization) but there is still the danger of infinite right recursions. Prolog is semi-decidable, because it is Turing-complete. Datalog is decidable, sacrificing completeness for, well, efficiency.

So, in short, it's not a problem if you consider the alternative, but of course there are trade-offs, always. It's like growing old, vs. dying young.

(I hope all this is not completely irrelevant to your question).

anon291 · 3 years ago
Whatever language this is... This is not datalog. This looks like a particular implementation of datalog in closure.

Actual datalog looks like prolog.

cmrdporcupine · 3 years ago
In reality, people are using "datalog" for a genre of datastore concepts based around horne clauses, or, basically relations + implicit joins. Datalog as a subset or dialect of prolog is only one variant of this. And Datomic has made an sexpr-syntaxed variant built around binary relations popular. To the point where some people in this thread can't seem to tell the two apart.

I am more interested in the general category of relational data model + logic programming than I am in any purity about Datalog in particular. In particular I'm very excited by "data/knowledge + behaviour sitting in a tree, k-i-s-s-i-n-g"

bogomipz · 3 years ago
Thanks. Is Datomic also a dialect of Prolog then?
anon291 · 3 years ago
Yeah I know, I'm just being pedantic.