Readit News logoReadit News
graphviz · 2 years ago
Graphviz has its own foundation graph library, that's not used by any other project. It has its good and bad points.

Based on that experience, we had our very own second-system-syndrome experience.

We decided our graph library should be modular, type safe, and efficient. (These properties came up in the comments here, too.) This is probably just a variation of "good, fast, cheap - pick any two."

By modular, want to write collections of graph algorithm libraries that are developed and even compiled independently.

By type safe, we mean we want to detect programming errors during compilation, or link time at the latest. We don't want programs to throw runtime errors like "your node does not have a color attribute".

By efficient, we mean that accessing an attribute of a graph is as cheap as a field in a C struct. (So, we don't want to carry around external hash table, or do a lot of string conversions, for instance.)

You can argue about whether these things are worth the price or even make sense, but that's what we wanted. We had some famous C++ creators in our lab, so we figured we could get help, and we were willing to give C++ another chance.

Gordon Woodhull, who had been an intern and kept working for us, is a brilliant programmer, and wrote an implementation of this kind of graph library working in templated C++. It's even published at https://www.dynagraph.org/ via sourceforge. The rest of us were not really sure we could ever understand how the code worked, so we had a code review with said famous C++ inventors. There were a lot of screens of code, and chinstroking, and greybeards pronounced "That would probably work." We knew we might have gone over the complexity cliff. (Let's not even talk about compile-time template errors, where one error fills an entire screen with details that only a... C++ inventor could love.) It was our fault, not anyone else's, and Gordon kept plugging away and even made all the dynamic graph layout stuff work, in Microsoft OLE, too. In hindsight it was probably our own Project Xanadu. While we got lost in this, a lot of things like Gephi (Java) and NetworkX and NetworKit (python) happened.

Also, John Ellson, a very talented software engineer who had written parts of Graphviz, revitalized the main effort.

samsquire · 2 years ago
Thank you for graphviz.

I use graphviz dot syntax, parsing with networkx to plan expensive tool execution and the graph structure lets me automatically paralellize.

Everdred2dx · 2 years ago
I love comments like this. Thank you for sharing.

Deleted Comment

transitionnel · 2 years ago
That all sounds fantastic!
obi1kenobi · 2 years ago
As someone who did a lot of work with graphs, "why don't programming languages have a built-in graph data type?" is a question I’ve been asked a million times.

I'm thrilled I'll be able to point folks to a much more in-depth analysis like the one here, instead of saying some variation of "it's really hard to do it well" and having them just take my word for it.

dataflow · 2 years ago
> "why don't programming languages have a built-in graph data type?"

What I find a little funny about that question is that people miss the fact that there isn't even a tree data structure in most languages. Most languages have static arrays, dynamic arrays, linked lists, and... that's it as far as structural types go. Everything else (BSTs, hashtables, etc.) is some semantic abstraction that hides some capabilities of the underlying structure, not a purely structural representation.

maxiepoo · 2 years ago
Typed functional languages like Haskell (data)/ML (type) do have a built-in way to define new tree types, and so does Rust (enum). It's one of the biggest things I miss when I'm not using these languages, especially when combined with some metaprogramming (deriving/derive) to get some functions defined for these new types very quickly.
jjice · 2 years ago
I used to think that since graphs are such a broad data structure that can be represented in different ways depending on requirements that it just made more sense to implement them at a domain-ish level (the article mentions this in the "There are too many implementation choices" section).

Then I saw Petgraph [0] which is the first time I had really looked at a generic graph library. It's very interesting, but I still have implemented graphs at a domain level.

[0] https://github.com/petgraph/petgraph

nickpsecurity · 2 years ago
You might say graphs are an abstract concept, solving our problems requires specialized graphs, and so we just use or build the kind we need.
TachyonicBytes · 2 years ago
Why wouldn't an abstract `Graph` type and specific implementations of that work? Like Java has for `Set` and various implementations?
bloaf · 2 years ago
I had an opposite experience. I was doing some new-to-me graph work in tcl and had just assumed the tcl standard library wouldn't have any graph algorithms. Come to find out I was wrong, and ended up saving myself a bunch of time not having to re-invent the wheel:

https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tclli...

paulddraper · 2 years ago
> "it's really hard to do it well"

More importantly, there are a lot of tradeoffs.

Virtually every language offers a hash map. You can roll your own to outperform in an individual circumstance, the default works pretty well.

You can't really do that with a graph. Maybe if you offered a bunch of graph types.

---

PS. Bit of trivia: Java's HashMap is a bit different from almost every other language in that it lets you tune the load factor.

adastra22 · 2 years ago
> You can't really do that with a graph. Maybe if you offered a bunch of graph types.

And so why isn't this the solution?

Most languages support both hash map (fast lookup) and balanced tree (ordered entries) primitives, even though they both implement the "associative map" container type.

Can't we have 2, 3, 5, or even 8 different graph types?

sltkr · 2 years ago
> Java's HashMap is a bit different from almost every other language in that it lets you tune the load factor.

I'm not sure that's particularly unusual. For example, C++ supports this too:

https://en.cppreference.com/w/cpp/container/unordered_map/ma...

swiftcoder · 2 years ago
It's also a bit different in that every insertion entails at least one allocation, and every lookup entails pointer-chasing through at least 3 dereferences. Java's lack of proper value types really kneecaps their hashmap implementation in many practical performance scenarios...
karmakaze · 2 years ago
It's also different in that each hash bucket can use a Red-Black tree when there's many entries.
eru · 2 years ago
> You can't really do that with a graph.

Why not? Relations model graphs really well, and you could have a relation datastructure in your language (or its standard library).

somat · 2 years ago
This is a super naive take. but I would consider the pointer the be the native graph type. What is wanted is not a graph type but the tooling to traverse graphs.
Nevermark · 2 years ago
Generalized pointers (RAM addresses, disk storage locations, network locations, etc.) would be the general way to implement explicit literal graphs.

Other graphs, that are partly or wholly computed or recomputed as needed from other relationships, could be considered "implicit" graphs and can be implemented many other ways.

The fact that graphs can be any combinations of literally or dependently defined, static or dynamic defined, would add even more complexity to any truly general graph library.

cwillu · 2 years ago
Period should probably be a comma; I read this comment a couple times before it made sense, rather than calling the grand parent super naive.
alanbernstein · 2 years ago
Maybe a pointer is better described as equivalent to a half-edge?
computerdork · 2 years ago
Hmm, think you're taking the title of the article a little too literally, and focusing on just the "data type" aspect. Yeah, the article itself indirectly agrees with you that pointers are often how graphs are implemented: "Graphs are a generalization of linked lists, binary trees, and hash tables," which are data-structures often implemented by pointers.

Yeah, the article is really saying what you're saying, that building a graph library ("tooling") is very complicated.

But, maybe I'm misunderstanding what you're saying??

eru · 2 years ago
You can use a pointer when building a graph data structure. You can also use numeric indices into arrays. Or you can store your graph in an entirely different way.

You are right that the data representation itself is only one small part of a data structure, the operations on that representation are also really important.

nine_k · 2 years ago
It's a bit like saying that a byte is a native number type, providing comparison to zero and addition as the only operations, and asking to figure the rest yourself.

I mean, it's technically sufficient, but I'd not call it proper support on a language level.

boothby · 2 years ago
I actually brought this hot take up in my conversation with Hillel -- something along the lines of "void * * is the canonical native graph representation in C."
sayrer · 2 years ago
I think it's because you have to think it through when they get too big for one machine. A lot of those so-called "NoSQL" databases are really meant to represent graph databases (think DynamoDB) in an adjacency list. I've had success with Gremlin and JanusGraph, but it's a really messy space. It's not a programming language problem in my opinion, but a distributed systems one.
atoav · 2 years ago
Puthon has one, as mentioned, but as a non CS-person I never encountered any problem in my programming-life that so fundamentally called for graphs that I needed one, not did I had so much fascination for graphs that it was a tool I wanted to force onto my problems.

I guess it is just that there are many problems that can be solved incredibly well without graphs and not that many where graphs outshine everything else so clearly that people would use them.

That being said, convince me of the opposite and I am happy.

kolinko · 2 years ago
What kinds of projects did you build? I'd risk saying that it's more likely you just didn't spot where graphs could be useful, or you implemented a graph not recognising it as such.

Even if you work with pure webdev - the least CS-requiring field of all programming, if you work with DOM, it's a tree/graph structure, or if you work with a sitemap - it's a graph.

From my experience working with non-cs programmers, they tend to struggle with finding solutions to problems that would be solved instantly with graph structures. Or they write suboptimal code because they end up doing BFS, where they should DFS, or brute force when they could use a specific graph algorithm.

eru · 2 years ago
Two people can look at the same solution to the same problem, and interpret and analyse it differently.

Some of those interpretations can involve graphs, but that's not necessarily intrinsic to the problem nor solution.

ylow · 2 years ago
I think this is because a graph is not a data-structure nor a data-type. It is really an abstraction.

Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v). And that really is all is needed for the most foundational set of graph algorithms.

Everything else are case-by-case constraints. Does A->B imply B->A? is the node set partitionable with certain constraints? Are there colors? labels?

To make things even more general I can go up one level and consider the hypergraph. In which case I just have a set of vertices, and a set of sets of vertices. This can be represented in a myriad of different ways depending on what you are interested in. Of which (non-hyper) graph is simply a special case.

An alternative way to think about it perhaps from the database perspective, is that its a query optimization and indexing problem. Depending on what questions you want to ask about the graph, there will be different ways to represent the graph to answer the question better. Just like there is not one way to represent the abstraction called "Table", there is not one way to do "Graph" either. It really depends on the questions you care about.

gloftus · 2 years ago
Yes, graphs are ubiquitous because they are so abstract. They live on the same level of abstraction as pure numbers. There are useful "numerical" libraries that exist, and by analogy I think you could say there are also useful "graphical" libraries that exist. But we don't really have "number" libraries, and we don't really have "graph" libraries, because those concepts are a bit too abstract to write APIs against.
kragen · 2 years ago
it's true that numbers are very abstract, which is what makes it so easy to design apis for them

the python runtime includes four built-in number types (small integer, arbitrary-precision integer, float, and complex) and the python standard library includes two more number types (decimal and fractions), and one of the most popular non-standard libraries for python is numpy, which provides some other kinds of numbers such as single-precision floats, vectors, and matrices. other systems like pari/gp have number libraries that provide other kinds of numbers, such as p-adic numbers and galois field elements

the only programming languages i've ever used that didn't have 'number' libraries were esoteric languages like brainfuck and the lambda calculus

dataflow · 2 years ago
> Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v).

Even that is severely overconstrained. It doesn't allow multiple edges to the same neighbor!

saghm · 2 years ago
Maybe the naming would be a little weird for that use case, but they didn't specify what the output of `Neighbors(v)` is; I don't see any reason why it couldn't return a multiset or a relation (w, c) where `c` is the number of connections between `v` and `w`
lou1306 · 2 years ago
Well to be fair, that constraint is also part of the mathematical definition of a graph, where the set of edges E is a binary relation over vertices V (i.e., a subset of V x V). You'd need either a multiset or a labelled relation (i.e., a subset of V x L x V for some set of labels L) to overcome that.
matheusmoreira · 2 years ago
> hypergraph

> I just have a set of vertices

> and a set of sets of vertices

Sounds kind of like a file system to me. Files are the vertices. Directories are the nestable sets of vertices.

rhelz · 2 years ago
Ya, the central obstacle is that:

1. for simple and small graph problems, a simple vector-of-vectors adjacency list is easy enough to code up.

2. For complex and huge graph problems, the only way to get performant solutions is to tailor the graph implementation to the specific details of the problem to be solved.

And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.

It's another instance of the phenomenon which Strousroup noticed: we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.

js8 · 2 years ago
> we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.

Interesting. But I am not sure we are good at sharing small things - every programming language has its own implementation of vectors. Within one language ecosystem, the API of a vector is small, and that's probably what makes it easy to share.

For operating systems, the API is relatively small compared to the internal complexity of the OS. This is also true for libraries for numerical problems, which are also easily shared. But the more you want to customize things (e.g. share a complicated data structure), this complicates the API and inhibits sharing.

So it seems to this is determined by the surface area (relative size of the API) of the thing being shared.

rhelz · 2 years ago
Well, we could always be better at sharing small things; but recall, the comment was made by Bjarne Stroustrup, and he probably thought that he had pretty much nailed the vector by that time :-)

The point of the OP is a bit broader than that: for something like a vector, we have at least figured out some language features which would help a programmer make an efficient and generic implementation. Templates are not great, but at least they are something.

For graphs, we don't even have that. What kind of built-in graph support would work for graphs which would work for pathfinding in a video game, or the internet, or a social networking graph a la facebook, or a routing graph routing a 100 million transistor chip....

We are getting better at abstraction all the time, but to abstract across all these kinds of applications is something which eludes us. Its really hard to see how you could give a programmer anything which would actually save him some time.

swiftcoder · 2 years ago
> every programming language has its own implementation of vectors

Many programming languages have more than one implementations of vectors. Turns out you want tiny vectors stored on the stack, and big vectors stored on the heap...

twoodfin · 2 years ago
And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.

I’m not so sure? Looking at an algorithm against an abstract graph type, then filling in the implementation to optimize for the particular algorithm seems right in the wheelhouse of code-specialized LLM’s.

rocqua · 2 years ago
My experience with cipher is that the query optimizer doesn't know enough about the graph to pick up on trivial optimizations. This can't be fixed without a way to tell the optimizer about those properties, and even just dreiging a language to tell the optimizer those things is difficult.

Just an LLM looking at your query isn't going to cut it. It will need to take your actual data into account.

rhelz · 2 years ago
Good point. The game has really changed in terms of what kinds of programs we can write now. Perhaps it's too pessimistic to not expect these sorts of optimizing compilers soon.

Sounds like a good research opportunity to me.

dustingetz · 2 years ago
Electric Clojure uses Clojure itself (s-expressions) as a graph authoring syntax, using a macro to reify dataflow through a reactive client/server system (here the use case is full stack user interfaces but the idea generalizes) https://github.com/hyperfiddle/electric (I'm the founder).

IMO, the answer to the question "Where are all the graph types?" is: the graph authoring DSL needs to express scope, control flow and abstraction, which essentially makes it isomorphic to a programming language, freed of its evaluation model. In Python and Typescript, embedding a complete programming language is something that's rather hard to do!

Also see my blog post "Four problems preventing visual flowchart programming from expressing web applications" https://www.dustingetz.com/#/page/four%20problems%20preventi...

hobofan · 2 years ago
I think the post mostly answers the questions "why are graph _algorithms_ not better supported in programming languages", with a focus that is much more on "big data" graph processing than graph support in general.

I think if you look at graph support in general you are also looking at wider questions, like "why are OGMs (Object Graph Mappers) not as popular as ORMs" and "why is JSON so prevalent while RDF (or another low-level graph serialization) isn't"?

And I think in the end it comes down to historic reasons (RDF emerged a bit too early and never evolved and accrued an ecosystem of horrible academic standards and implementations), and a graphs having just a smidge more of inherent complexity in implementation and learning curve that just doesn't scale well across many developers.

------

I also wouldn't put too much weight on the "Graph Querying Language" part of the article. It sadly reads like exactly the marketing copy you would read from Neo4J or SPARQL enthusiasts that haven't tried building a product on top of it.

> The main difference between all GQLs and SQL is that the “joins” (relationships) are first-class entities.

Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)

If you also go to the lower levels of any graph query language and look at it's query plans you'll notice that there isn't any meaningful difference to that of one you'll find in an SQL based query. The standardization of GQL[0] as an SQL extension is evidence for that.

> In SPARQL relationships are just edges, making the same query easy.

SPARQL is easy if you need to do exact path traversals. If you try to do anything sophisticated with it (like you would in the backend of a webapp), you'll quickly run into footguns like joins with unbound values and you accidently join your whole result set away.

[0]: https://en.wikipedia.org/wiki/Graph_Query_Language

lmm · 2 years ago
> Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)

Having its own keyword is pretty strong evidence that something isn't first-class (e.g. typeclasses are not first-class in Haskell; control flow is not first-class in most programming languages).

eyelidlessness · 2 years ago
I think OP is using “first class” as in “an explicit semantic affordance”, and you seem to be using “first class” as in “a supported operand” (or similar, feel free to correct me if I’m misunderstanding). In which case both points are right, albeit orthogonal.
cryptonector · 2 years ago
JOINs, and joins in RECURSIVE queries at that, are the heart of "graph" databases, so SQL RDBMSes generally do it fine, but... without syntactic shortcuts. A graph QL is all about adding those syntactic shortcuts.
criloz2 · 2 years ago
Graph drawing tools are also very underwhelming, they work pretty good for small graphs until you have something like 500 nodes or more, then eventually their output becomes complete incompressible or very difficult to look at it, they miss the ability to automatically organize those graph in hierarchical structures and provide a nice interface to explore them, we are used that everything around us have some kind of hierarchy, I think that is the same kind of problem that will need to be solved in order to have a generic graph data type, also this kind of thing will need to be implemented at the compiler level where those graph generic algos will be adapted to the generated hierarchy of structures, and if you add a theorem prover that can check that certain subgraph will always have certain structures you can statically generated those procedures and for the other super graphs those methods will be generated dynamically at runtime.

So whoever solve the problem for generic graph drawing will have the ability or the insight to implement this too.

kjqgqkejbfefn · 2 years ago
>Graph drawing tools

It's hard

Graphviz-like generic graph-drawing library. More options, more control.

https://eclipse.dev/elk/

Experiments by the same team responsible for the development of ELK, at Kiel University

https://github.com/kieler/KLighD

Kieler project wiki

https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...

Constraint-based graph drawing libraries

https://www.adaptagrams.org/

JS implementation

https://ialab.it.monash.edu/webcola/

Some cool stuff:

HOLA: Human-like Orthogonal Network Layout

https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf

Confluent Graphs demos: makes edges more readable.

https://www.aviz.fr/~bbach/confluentgraphs/

Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with Ports

https://arxiv.org/pdf/1408.4626.pdf

Improved Optimal and Approximate Power Graph Compression for Clearer Visualisation of Dense Graphs

https://arxiv.org/pdf/1311.6996v1.pdf

samatman · 2 years ago
Some algorithms do better at this than others, but "make a good diagram of a graph" is an intelligence-complete problem in the general case. Two people might render structurally-identical graphs in very different ways, to emphasize different aspects of the data. This is in fact a similar problem to the "generic graph algorithm" and "generic graph data structure" problems.

Graphs straddle the line between code and data. For instance, any given program has a call graph, so in a real sense, the "generic graph algorithm" is just computation.

nine_k · 2 years ago
Ideal things are often tree-like. Real-world structures are usually DAGs if they are nice and well-behaved.

Making things planar, or almost planar with few crossings and nice clustering of related nodes, is usually hard past a couple dozen nodes :(

hobofan · 2 years ago
> we are used that everything around us have some kind of hierarchy

I think the problem is more that we are used to the illusion/delusion that everything is hierarchical. The problem that we then encouter is with graph drawing is that it has to try and reconcile the fact that things in practice are rarely really hierarchical, and it's hard to draw those lines of where the hierarchies are with mathematical rigor. And that problem gets worse and worse the less properties you are allowed to assume about the underlying graph structure (connectedness, cyclic/acyclic, sparse/dense).

In practice when you want build a UI that interacts with graphs it's often feasible to determine/impose one or two levels of meta-hierarchy with which you can do clustering (allows for reducing layout destroying impact of hairball nodes + improves rendering performance by reducing node count) and layout with fCOSE (Cytoscape.js has an implementation of that).

swagasaurus-rex · 2 years ago
The illustrations I've seen of neural networks really highlights the sheet incomprehensibility of visualizing large graphs
fatherzine · 2 years ago
What an awesome article. Kudos to the author!

On the core observation "there are too many implementation choices", that is not quite right. True, the author mentions 4, and there are further variations. In practice, a library can:

1. Implement all suitable graph representations.

2. Implement algorithms tailored to the representation(s) that offer the highest performance.

3. Provide transformations from one representation to another. This is O(#representations), trivial to implement and trivial to use. Quite fair workload for both maintainers and users.

4. Bonus, provide import / export transformations from / to common standard library datatypes and idioms.

Memory and transformations are cheap, 99% of use-cases would likely find the overhead of transforming data, both in RAM and CPU, negligible.

Edit: "the harsh truth of working at Google is that in the end you are moving protobufs from one place to another." -- https://news.ycombinator.com/item?id=20132880

josephg · 2 years ago
Sounds like the makings of a huge library that I’m not sure I’d even use in my work. I use graphs heavily in my work, and my experience matches the people the author interviewed.

We always end up reimplementing graphs because:

- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).

- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.

- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.

I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.

therobots927 · 2 years ago
What do you mean by “subgraph diffing”? I work with graphs a lot and use SQL almost all the time. Sometimes I need to compute connected components with python.