Graphviz has its own foundation graph library, that's not used by any other project. It has its good and bad points.
Based on that experience, we had our very own second-system-syndrome experience.
We decided our graph library should be modular, type safe, and efficient. (These properties came up in the comments here, too.) This is probably just a variation of "good, fast, cheap - pick any two."
By modular, want to write collections of graph algorithm libraries that are developed and even compiled independently.
By type safe, we mean we want to detect programming errors during compilation, or link time at the latest. We don't want programs to throw runtime errors like "your node does not have a color attribute".
By efficient, we mean that accessing an attribute of a graph is as cheap as a field in a C struct. (So, we don't want to carry around external hash table, or do a lot of string conversions, for instance.)
You can argue about whether these things are worth the price or even make sense, but that's what we wanted. We had some famous C++ creators in our lab, so we figured we could get help, and we were willing to give C++ another chance.
Gordon Woodhull, who had been an intern and kept working for us, is a brilliant programmer, and wrote an implementation of this kind of graph library working in templated C++. It's even published at https://www.dynagraph.org/ via sourceforge. The rest of us were not really sure we could ever understand how the code worked, so we had a code review with said famous C++ inventors. There were a lot of screens of code, and chinstroking, and greybeards pronounced "That would probably work." We knew we might have gone over the complexity cliff. (Let's not even talk about compile-time template errors, where one error fills an entire screen with details that only a... C++ inventor could love.) It was our fault, not anyone else's, and Gordon kept plugging away and even made all the dynamic graph layout stuff work, in Microsoft OLE, too. In hindsight it was probably our own Project Xanadu. While we got lost in this, a lot of things like Gephi (Java) and NetworkX and NetworKit (python) happened.
Also, John Ellson, a very talented software engineer who had written parts of Graphviz, revitalized the main effort.
As someone who did a lot of work with graphs, "why don't programming languages have a built-in graph data type?" is a question I’ve been asked a million times.
I'm thrilled I'll be able to point folks to a much more in-depth analysis like the one here, instead of saying some variation of "it's really hard to do it well" and having them just take my word for it.
> "why don't programming languages have a built-in graph data type?"
What I find a little funny about that question is that people miss the fact that there isn't even a tree data structure in most languages. Most languages have static arrays, dynamic arrays, linked lists, and... that's it as far as structural types go. Everything else (BSTs, hashtables, etc.) is some semantic abstraction that hides some capabilities of the underlying structure, not a purely structural representation.
Typed functional languages like Haskell (data)/ML (type) do have a built-in way to define new tree types, and so does Rust (enum). It's one of the biggest things I miss when I'm not using these languages, especially when combined with some metaprogramming (deriving/derive) to get some functions defined for these new types very quickly.
I used to think that since graphs are such a broad data structure that can be represented in different ways depending on requirements that it just made more sense to implement them at a domain-ish level (the article mentions this in the "There are too many implementation choices" section).
Then I saw Petgraph [0] which is the first time I had really looked at a generic graph library. It's very interesting, but I still have implemented graphs at a domain level.
I had an opposite experience. I was doing some new-to-me graph work in tcl and had just assumed the tcl standard library wouldn't have any graph algorithms. Come to find out I was wrong, and ended up saving myself a bunch of time not having to re-invent the wheel:
> You can't really do that with a graph. Maybe if you offered a bunch of graph types.
And so why isn't this the solution?
Most languages support both hash map (fast lookup) and balanced tree (ordered entries) primitives, even though they both implement the "associative map" container type.
Can't we have 2, 3, 5, or even 8 different graph types?
It's also a bit different in that every insertion entails at least one allocation, and every lookup entails pointer-chasing through at least 3 dereferences. Java's lack of proper value types really kneecaps their hashmap implementation in many practical performance scenarios...
This is a super naive take. but I would consider the pointer the be the native graph type. What is wanted is not a graph type but the tooling to traverse graphs.
Generalized pointers (RAM addresses, disk storage locations, network locations, etc.) would be the general way to implement explicit literal graphs.
Other graphs, that are partly or wholly computed or recomputed as needed from other relationships, could be considered "implicit" graphs and can be implemented many other ways.
The fact that graphs can be any combinations of literally or dependently defined, static or dynamic defined, would add even more complexity to any truly general graph library.
Hmm, think you're taking the title of the article a little too literally, and focusing on just the "data type" aspect. Yeah, the article itself indirectly agrees with you that pointers are often how graphs are implemented: "Graphs are a generalization of linked lists, binary trees, and hash tables," which are data-structures often implemented by pointers.
Yeah, the article is really saying what you're saying, that building a graph library ("tooling") is very complicated.
But, maybe I'm misunderstanding what you're saying??
You can use a pointer when building a graph data structure. You can also use numeric indices into arrays. Or you can store your graph in an entirely different way.
You are right that the data representation itself is only one small part of a data structure, the operations on that representation are also really important.
It's a bit like saying that a byte is a native number type, providing comparison to zero and addition as the only operations, and asking to figure the rest yourself.
I mean, it's technically sufficient, but I'd not call it proper support on a language level.
I actually brought this hot take up in my conversation with Hillel -- something along the lines of "void * * is the canonical native graph representation in C."
I think it's because you have to think it through when they get too big for one machine. A lot of those so-called "NoSQL" databases are really meant to represent graph databases (think DynamoDB) in an adjacency list. I've had success with Gremlin and JanusGraph, but it's a really messy space. It's not a programming language problem in my opinion, but a distributed systems one.
Puthon has one, as mentioned, but as a non CS-person I never encountered any problem in my programming-life that so fundamentally called for graphs that I needed one, not did I had so much fascination for graphs that it was a tool I wanted to force onto my problems.
I guess it is just that there are many problems that can be solved incredibly well without graphs and not that many where graphs outshine everything else so clearly that people would use them.
That being said, convince me of the opposite and I am happy.
What kinds of projects did you build? I'd risk saying that it's more likely you just didn't spot where graphs could be useful, or you implemented a graph not recognising it as such.
Even if you work with pure webdev - the least CS-requiring field of all programming, if you work with DOM, it's a tree/graph structure, or if you work with a sitemap - it's a graph.
From my experience working with non-cs programmers, they tend to struggle with finding solutions to problems that would be solved instantly with graph structures. Or they write suboptimal code because they end up doing BFS, where they should DFS, or brute force when they could use a specific graph algorithm.
I think this is because a graph is not a data-structure nor a data-type. It is really an abstraction.
Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v). And that really is all is needed for the most foundational set of graph algorithms.
Everything else are case-by-case constraints. Does A->B imply B->A? is the node set partitionable with certain constraints? Are there colors? labels?
To make things even more general I can go up one level and consider the hypergraph. In which case I just have a set of vertices, and a set of sets of vertices. This can be represented in a myriad of different ways depending on what you are interested in. Of which (non-hyper) graph is simply a special case.
An alternative way to think about it perhaps from the database perspective, is that its a query optimization and indexing problem. Depending on what questions you want to ask about the graph, there will be different ways to represent the graph to answer the question better. Just like there is not one way to represent the abstraction called "Table", there is not one way to do "Graph" either. It really depends on the questions you care about.
Yes, graphs are ubiquitous because they are so abstract. They live on the same level of abstraction as pure numbers. There are useful "numerical" libraries that exist, and by analogy I think you could say there are also useful "graphical" libraries that exist. But we don't really have "number" libraries, and we don't really have "graph" libraries, because those concepts are a bit too abstract to write APIs against.
it's true that numbers are very abstract, which is what makes it so easy to design apis for them
the python runtime includes four built-in number types (small integer, arbitrary-precision integer, float, and complex) and the python standard library includes two more number types (decimal and fractions), and one of the most popular non-standard libraries for python is numpy, which provides some other kinds of numbers such as single-precision floats, vectors, and matrices. other systems like pari/gp have number libraries that provide other kinds of numbers, such as p-adic numbers and galois field elements
the only programming languages i've ever used that didn't have 'number' libraries were esoteric languages like brainfuck and the lambda calculus
Maybe the naming would be a little weird for that use case, but they didn't specify what the output of `Neighbors(v)` is; I don't see any reason why it couldn't return a multiset or a relation (w, c) where `c` is the number of connections between `v` and `w`
Well to be fair, that constraint is also part of the mathematical definition of a graph, where the set of edges E is a binary relation over vertices V (i.e., a subset of V x V). You'd need either a multiset or a labelled relation (i.e., a subset of V x L x V for some set of labels L) to overcome that.
1. for simple and small graph problems, a simple vector-of-vectors adjacency list is easy enough to code up.
2. For complex and huge graph problems, the only way to get performant solutions is to tailor the graph implementation to the specific details of the problem to be solved.
And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.
It's another instance of the phenomenon which Strousroup noticed: we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.
> we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.
Interesting. But I am not sure we are good at sharing small things - every programming language has its own implementation of vectors. Within one language ecosystem, the API of a vector is small, and that's probably what makes it easy to share.
For operating systems, the API is relatively small compared to the internal complexity of the OS. This is also true for libraries for numerical problems, which are also easily shared. But the more you want to customize things (e.g. share a complicated data structure), this complicates the API and inhibits sharing.
So it seems to this is determined by the surface area (relative size of the API) of the thing being shared.
Well, we could always be better at sharing small things; but recall, the comment was made by Bjarne Stroustrup, and he probably thought that he had pretty much nailed the vector by that time :-)
The point of the OP is a bit broader than that: for something like a vector, we have at least figured out some language features which would help a programmer make an efficient and generic implementation. Templates are not great, but at least they are something.
For graphs, we don't even have that. What kind of built-in graph support would work for graphs which would work for pathfinding in a video game, or the internet, or a social networking graph a la facebook, or a routing graph routing a 100 million transistor chip....
We are getting better at abstraction all the time, but to abstract across all these kinds of applications is something which eludes us. Its really hard to see how you could give a programmer anything which would actually save him some time.
> every programming language has its own implementation of vectors
Many programming languages have more than one implementations of vectors. Turns out you want tiny vectors stored on the stack, and big vectors stored on the heap...
And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.
I’m not so sure? Looking at an algorithm against an abstract graph type, then filling in the implementation to optimize for the particular algorithm seems right in the wheelhouse of code-specialized LLM’s.
My experience with cipher is that the query optimizer doesn't know enough about the graph to pick up on trivial optimizations. This can't be fixed without a way to tell the optimizer about those properties, and even just dreiging a language to tell the optimizer those things is difficult.
Just an LLM looking at your query isn't going to cut it. It will need to take your actual data into account.
Good point. The game has really changed in terms of what kinds of programs we can write now. Perhaps it's too pessimistic to not expect these sorts of optimizing compilers soon.
Electric Clojure uses Clojure itself (s-expressions) as a graph authoring syntax, using a macro to reify dataflow through a reactive client/server system (here the use case is full stack user interfaces but the idea generalizes) https://github.com/hyperfiddle/electric (I'm the founder).
IMO, the answer to the question "Where are all the graph types?" is: the graph authoring DSL needs to express scope, control flow and abstraction, which essentially makes it isomorphic to a programming language, freed of its evaluation model. In Python and Typescript, embedding a complete programming language is something that's rather hard to do!
I think the post mostly answers the questions "why are graph _algorithms_ not better supported in programming languages", with a focus that is much more on "big data" graph processing than graph support in general.
I think if you look at graph support in general you are also looking at wider questions, like "why are OGMs (Object Graph Mappers) not as popular as ORMs" and "why is JSON so prevalent while RDF (or another low-level graph serialization) isn't"?
And I think in the end it comes down to historic reasons (RDF emerged a bit too early and never evolved and accrued an ecosystem of horrible academic standards and implementations), and a graphs having just a smidge more of inherent complexity in implementation and learning curve that just doesn't scale well across many developers.
------
I also wouldn't put too much weight on the "Graph Querying Language" part of the article. It sadly reads like exactly the marketing copy you would read from Neo4J or SPARQL enthusiasts that haven't tried building a product on top of it.
> The main difference between all GQLs and SQL is that the “joins” (relationships) are first-class entities.
Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)
If you also go to the lower levels of any graph query language and look at it's query plans you'll notice that there isn't any meaningful difference to that of one you'll find in an SQL based query. The standardization of GQL[0] as an SQL extension is evidence for that.
> In SPARQL relationships are just edges, making the same query easy.
SPARQL is easy if you need to do exact path traversals. If you try to do anything sophisticated with it (like you would in the backend of a webapp), you'll quickly run into footguns like joins with unbound values and you accidently join your whole result set away.
> Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)
Having its own keyword is pretty strong evidence that something isn't first-class (e.g. typeclasses are not first-class in Haskell; control flow is not first-class in most programming languages).
I think OP is using “first class” as in “an explicit semantic affordance”, and you seem to be using “first class” as in “a supported operand” (or similar, feel free to correct me if I’m misunderstanding). In which case both points are right, albeit orthogonal.
JOINs, and joins in RECURSIVE queries at that, are the heart of "graph" databases, so SQL RDBMSes generally do it fine, but... without syntactic shortcuts. A graph QL is all about adding those syntactic shortcuts.
Graph drawing tools are also very underwhelming, they work pretty good for small graphs until you have something like 500 nodes or more, then eventually their output becomes complete incompressible or very difficult to look at it, they miss the ability to automatically organize those graph in hierarchical structures and provide a nice interface to explore them, we are used that everything around us have some kind of hierarchy, I think that is the same kind of problem that will need to be solved in order to have a generic graph data type, also this kind of thing will need to be implemented at the compiler level where those graph generic algos will be adapted to the generated hierarchy of structures, and if you add a theorem prover that can check that certain subgraph will always have certain structures you can statically generated those procedures and for the other super graphs those methods will be generated dynamically at runtime.
So whoever solve the problem for generic graph drawing will have the ability or the insight to implement this too.
Some algorithms do better at this than others, but "make a good diagram of a graph" is an intelligence-complete problem in the general case. Two people might render structurally-identical graphs in very different ways, to emphasize different aspects of the data. This is in fact a similar problem to the "generic graph algorithm" and "generic graph data structure" problems.
Graphs straddle the line between code and data. For instance, any given program has a call graph, so in a real sense, the "generic graph algorithm" is just computation.
> we are used that everything around us have some kind of hierarchy
I think the problem is more that we are used to the illusion/delusion that everything is hierarchical. The problem that we then encouter is with graph drawing is that it has to try and reconcile the fact that things in practice are rarely really hierarchical, and it's hard to draw those lines of where the hierarchies are with mathematical rigor. And that problem gets worse and worse the less properties you are allowed to assume about the underlying graph structure (connectedness, cyclic/acyclic, sparse/dense).
In practice when you want build a UI that interacts with graphs it's often feasible to determine/impose one or two levels of meta-hierarchy with which you can do clustering (allows for reducing layout destroying impact of hairball nodes + improves rendering performance by reducing node count) and layout with fCOSE (Cytoscape.js has an implementation of that).
On the core observation "there are too many implementation choices", that is not quite right. True, the author mentions 4, and there are further variations. In practice, a library can:
1. Implement all suitable graph representations.
2. Implement algorithms tailored to the representation(s) that offer the highest performance.
3. Provide transformations from one representation to another. This is O(#representations), trivial to implement and trivial to use. Quite fair workload for both maintainers and users.
4. Bonus, provide import / export transformations from / to common standard library datatypes and idioms.
Memory and transformations are cheap, 99% of use-cases would likely find the overhead of transforming data, both in RAM and CPU, negligible.
Sounds like the makings of a huge library that I’m not sure I’d even use in my work. I use graphs heavily in my work, and my experience matches the people the author interviewed.
We always end up reimplementing graphs because:
- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).
- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.
- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.
I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.
What do you mean by “subgraph diffing”? I work with graphs a lot and use SQL almost all the time. Sometimes I need to compute connected components with python.
Based on that experience, we had our very own second-system-syndrome experience.
We decided our graph library should be modular, type safe, and efficient. (These properties came up in the comments here, too.) This is probably just a variation of "good, fast, cheap - pick any two."
By modular, want to write collections of graph algorithm libraries that are developed and even compiled independently.
By type safe, we mean we want to detect programming errors during compilation, or link time at the latest. We don't want programs to throw runtime errors like "your node does not have a color attribute".
By efficient, we mean that accessing an attribute of a graph is as cheap as a field in a C struct. (So, we don't want to carry around external hash table, or do a lot of string conversions, for instance.)
You can argue about whether these things are worth the price or even make sense, but that's what we wanted. We had some famous C++ creators in our lab, so we figured we could get help, and we were willing to give C++ another chance.
Gordon Woodhull, who had been an intern and kept working for us, is a brilliant programmer, and wrote an implementation of this kind of graph library working in templated C++. It's even published at https://www.dynagraph.org/ via sourceforge. The rest of us were not really sure we could ever understand how the code worked, so we had a code review with said famous C++ inventors. There were a lot of screens of code, and chinstroking, and greybeards pronounced "That would probably work." We knew we might have gone over the complexity cliff. (Let's not even talk about compile-time template errors, where one error fills an entire screen with details that only a... C++ inventor could love.) It was our fault, not anyone else's, and Gordon kept plugging away and even made all the dynamic graph layout stuff work, in Microsoft OLE, too. In hindsight it was probably our own Project Xanadu. While we got lost in this, a lot of things like Gephi (Java) and NetworkX and NetworKit (python) happened.
Also, John Ellson, a very talented software engineer who had written parts of Graphviz, revitalized the main effort.
I use graphviz dot syntax, parsing with networkx to plan expensive tool execution and the graph structure lets me automatically paralellize.
Deleted Comment
I'm thrilled I'll be able to point folks to a much more in-depth analysis like the one here, instead of saying some variation of "it's really hard to do it well" and having them just take my word for it.
What I find a little funny about that question is that people miss the fact that there isn't even a tree data structure in most languages. Most languages have static arrays, dynamic arrays, linked lists, and... that's it as far as structural types go. Everything else (BSTs, hashtables, etc.) is some semantic abstraction that hides some capabilities of the underlying structure, not a purely structural representation.
Then I saw Petgraph [0] which is the first time I had really looked at a generic graph library. It's very interesting, but I still have implemented graphs at a domain level.
[0] https://github.com/petgraph/petgraph
https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tclli...
More importantly, there are a lot of tradeoffs.
Virtually every language offers a hash map. You can roll your own to outperform in an individual circumstance, the default works pretty well.
You can't really do that with a graph. Maybe if you offered a bunch of graph types.
---
PS. Bit of trivia: Java's HashMap is a bit different from almost every other language in that it lets you tune the load factor.
And so why isn't this the solution?
Most languages support both hash map (fast lookup) and balanced tree (ordered entries) primitives, even though they both implement the "associative map" container type.
Can't we have 2, 3, 5, or even 8 different graph types?
I'm not sure that's particularly unusual. For example, C++ supports this too:
https://en.cppreference.com/w/cpp/container/unordered_map/ma...
Why not? Relations model graphs really well, and you could have a relation datastructure in your language (or its standard library).
Other graphs, that are partly or wholly computed or recomputed as needed from other relationships, could be considered "implicit" graphs and can be implemented many other ways.
The fact that graphs can be any combinations of literally or dependently defined, static or dynamic defined, would add even more complexity to any truly general graph library.
Yeah, the article is really saying what you're saying, that building a graph library ("tooling") is very complicated.
But, maybe I'm misunderstanding what you're saying??
You are right that the data representation itself is only one small part of a data structure, the operations on that representation are also really important.
I mean, it's technically sufficient, but I'd not call it proper support on a language level.
I guess it is just that there are many problems that can be solved incredibly well without graphs and not that many where graphs outshine everything else so clearly that people would use them.
That being said, convince me of the opposite and I am happy.
Even if you work with pure webdev - the least CS-requiring field of all programming, if you work with DOM, it's a tree/graph structure, or if you work with a sitemap - it's a graph.
From my experience working with non-cs programmers, they tend to struggle with finding solutions to problems that would be solved instantly with graph structures. Or they write suboptimal code because they end up doing BFS, where they should DFS, or brute force when they could use a specific graph algorithm.
Some of those interpretations can involve graphs, but that's not necessarily intrinsic to the problem nor solution.
Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v). And that really is all is needed for the most foundational set of graph algorithms.
Everything else are case-by-case constraints. Does A->B imply B->A? is the node set partitionable with certain constraints? Are there colors? labels?
To make things even more general I can go up one level and consider the hypergraph. In which case I just have a set of vertices, and a set of sets of vertices. This can be represented in a myriad of different ways depending on what you are interested in. Of which (non-hyper) graph is simply a special case.
An alternative way to think about it perhaps from the database perspective, is that its a query optimization and indexing problem. Depending on what questions you want to ask about the graph, there will be different ways to represent the graph to answer the question better. Just like there is not one way to represent the abstraction called "Table", there is not one way to do "Graph" either. It really depends on the questions you care about.
the python runtime includes four built-in number types (small integer, arbitrary-precision integer, float, and complex) and the python standard library includes two more number types (decimal and fractions), and one of the most popular non-standard libraries for python is numpy, which provides some other kinds of numbers such as single-precision floats, vectors, and matrices. other systems like pari/gp have number libraries that provide other kinds of numbers, such as p-adic numbers and galois field elements
the only programming languages i've ever used that didn't have 'number' libraries were esoteric languages like brainfuck and the lambda calculus
Even that is severely overconstrained. It doesn't allow multiple edges to the same neighbor!
> I just have a set of vertices
> and a set of sets of vertices
Sounds kind of like a file system to me. Files are the vertices. Directories are the nestable sets of vertices.
1. for simple and small graph problems, a simple vector-of-vectors adjacency list is easy enough to code up.
2. For complex and huge graph problems, the only way to get performant solutions is to tailor the graph implementation to the specific details of the problem to be solved.
And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.
It's another instance of the phenomenon which Strousroup noticed: we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.
Interesting. But I am not sure we are good at sharing small things - every programming language has its own implementation of vectors. Within one language ecosystem, the API of a vector is small, and that's probably what makes it easy to share.
For operating systems, the API is relatively small compared to the internal complexity of the OS. This is also true for libraries for numerical problems, which are also easily shared. But the more you want to customize things (e.g. share a complicated data structure), this complicates the API and inhibits sharing.
So it seems to this is determined by the surface area (relative size of the API) of the thing being shared.
The point of the OP is a bit broader than that: for something like a vector, we have at least figured out some language features which would help a programmer make an efficient and generic implementation. Templates are not great, but at least they are something.
For graphs, we don't even have that. What kind of built-in graph support would work for graphs which would work for pathfinding in a video game, or the internet, or a social networking graph a la facebook, or a routing graph routing a 100 million transistor chip....
We are getting better at abstraction all the time, but to abstract across all these kinds of applications is something which eludes us. Its really hard to see how you could give a programmer anything which would actually save him some time.
Many programming languages have more than one implementations of vectors. Turns out you want tiny vectors stored on the stack, and big vectors stored on the heap...
I’m not so sure? Looking at an algorithm against an abstract graph type, then filling in the implementation to optimize for the particular algorithm seems right in the wheelhouse of code-specialized LLM’s.
Just an LLM looking at your query isn't going to cut it. It will need to take your actual data into account.
Sounds like a good research opportunity to me.
IMO, the answer to the question "Where are all the graph types?" is: the graph authoring DSL needs to express scope, control flow and abstraction, which essentially makes it isomorphic to a programming language, freed of its evaluation model. In Python and Typescript, embedding a complete programming language is something that's rather hard to do!
Also see my blog post "Four problems preventing visual flowchart programming from expressing web applications" https://www.dustingetz.com/#/page/four%20problems%20preventi...
I think if you look at graph support in general you are also looking at wider questions, like "why are OGMs (Object Graph Mappers) not as popular as ORMs" and "why is JSON so prevalent while RDF (or another low-level graph serialization) isn't"?
And I think in the end it comes down to historic reasons (RDF emerged a bit too early and never evolved and accrued an ecosystem of horrible academic standards and implementations), and a graphs having just a smidge more of inherent complexity in implementation and learning curve that just doesn't scale well across many developers.
------
I also wouldn't put too much weight on the "Graph Querying Language" part of the article. It sadly reads like exactly the marketing copy you would read from Neo4J or SPARQL enthusiasts that haven't tried building a product on top of it.
> The main difference between all GQLs and SQL is that the “joins” (relationships) are first-class entities.
Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)
If you also go to the lower levels of any graph query language and look at it's query plans you'll notice that there isn't any meaningful difference to that of one you'll find in an SQL based query. The standardization of GQL[0] as an SQL extension is evidence for that.
> In SPARQL relationships are just edges, making the same query easy.
SPARQL is easy if you need to do exact path traversals. If you try to do anything sophisticated with it (like you would in the backend of a webapp), you'll quickly run into footguns like joins with unbound values and you accidently join your whole result set away.
[0]: https://en.wikipedia.org/wiki/Graph_Query_Language
Having its own keyword is pretty strong evidence that something isn't first-class (e.g. typeclasses are not first-class in Haskell; control flow is not first-class in most programming languages).
So whoever solve the problem for generic graph drawing will have the ability or the insight to implement this too.
It's hard
Graphviz-like generic graph-drawing library. More options, more control.
https://eclipse.dev/elk/
Experiments by the same team responsible for the development of ELK, at Kiel University
https://github.com/kieler/KLighD
Kieler project wiki
https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...
Constraint-based graph drawing libraries
https://www.adaptagrams.org/
JS implementation
https://ialab.it.monash.edu/webcola/
Some cool stuff:
HOLA: Human-like Orthogonal Network Layout
https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf
Confluent Graphs demos: makes edges more readable.
https://www.aviz.fr/~bbach/confluentgraphs/
Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with Ports
https://arxiv.org/pdf/1408.4626.pdf
Improved Optimal and Approximate Power Graph Compression for Clearer Visualisation of Dense Graphs
https://arxiv.org/pdf/1311.6996v1.pdf
Graphs straddle the line between code and data. For instance, any given program has a call graph, so in a real sense, the "generic graph algorithm" is just computation.
Making things planar, or almost planar with few crossings and nice clustering of related nodes, is usually hard past a couple dozen nodes :(
I think the problem is more that we are used to the illusion/delusion that everything is hierarchical. The problem that we then encouter is with graph drawing is that it has to try and reconcile the fact that things in practice are rarely really hierarchical, and it's hard to draw those lines of where the hierarchies are with mathematical rigor. And that problem gets worse and worse the less properties you are allowed to assume about the underlying graph structure (connectedness, cyclic/acyclic, sparse/dense).
In practice when you want build a UI that interacts with graphs it's often feasible to determine/impose one or two levels of meta-hierarchy with which you can do clustering (allows for reducing layout destroying impact of hairball nodes + improves rendering performance by reducing node count) and layout with fCOSE (Cytoscape.js has an implementation of that).
On the core observation "there are too many implementation choices", that is not quite right. True, the author mentions 4, and there are further variations. In practice, a library can:
1. Implement all suitable graph representations.
2. Implement algorithms tailored to the representation(s) that offer the highest performance.
3. Provide transformations from one representation to another. This is O(#representations), trivial to implement and trivial to use. Quite fair workload for both maintainers and users.
4. Bonus, provide import / export transformations from / to common standard library datatypes and idioms.
Memory and transformations are cheap, 99% of use-cases would likely find the overhead of transforming data, both in RAM and CPU, negligible.
Edit: "the harsh truth of working at Google is that in the end you are moving protobufs from one place to another." -- https://news.ycombinator.com/item?id=20132880
We always end up reimplementing graphs because:
- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).
- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.
- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.
I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.