Ask HN: Visualizing software designs, especially of large systems (if at all)?

For a small but complicated project I got thrown into a while ago, the only way for me to understand it was to print out all the source directly, vertically tape together the pages for a single file, and then lay them all out on a huge table. Then I took multicolored markers and started physically drawing out the call chains. I then I sers-toi the system, and also found an enraging bug: the system widely used the variables "blah_name" and "blah_id", including in many functions' parameters. Except, in one case, blah_id was passed in as blah_name and thenceforth became known as blah_name.

I don't know if an automated visualization system is possible, but you'll have to understand the whole thing before doing so. Pen and paper was the most expedient solution for me at the time.

DANK_YACHT · 3 years ago

I use pen and paper as well, but rather than print out all the source code, I write down the call stack. A calls B calls C, etc. along with the line numbers of the call. Much easier than printing out the source and you still have the IDE niceties like go to definition, find in source, etc.

ASalazarMX · 3 years ago

This reminded when I had to maintain dozens of old 10,000 lines COBOL programs as a junior programmer. I felt so lost I made a program that would print only the names of data structures and functions. Seeing the source resumed in a handful of pages, and being able to highlight and draw on it, helped me a lot. Digital has flexibility, but sometimes paper works best.

er_d0s · 3 years ago

Hah! I did the exact same thing for ages on paper and eventually evolved the system to manage my workload and context switching… I still use it a lot for going deep while debugging/understanding code. I ended up making it into an app when I broke my wrist and could still type but couldn’t hold a pen. I can’t remember if there’s rules about self promotion in comments here but it’s up at journalist mode dot com

_dain_ · 3 years ago

I've done this too, taped a bunch of impenetrable code to the wall and scribbled on it with pen to figure out wtf was happening. I propose that this be called the "Pepe Silvia" debugging method since it looks like a crazy conspiracy chart. Eventually you'll figure out why nobody is getting their mail ...

https://www.youtube.com/watch?v=_nTpsv9PNqo

jimpudar · 3 years ago

I also used to do this when working on a big convoluted system. I had a conference room near my desk with all the walls completely covered in code. A big pack of multicolored highlighters is key.

I remember a whole bunch of light bulb moments when I showed other developers the "big picture". It's an awesome technique when you're forced to work on spaghetti!

zach_miller · 3 years ago

Sounds like something the type system should have caught!

sixstringtheory · 3 years ago

Lucky you if you work with people who see the value in a language with good type checking or that doesn’t just use strings for everything.

growwrkr6 · 3 years ago

Language support varies but if it’s possible, why not generate an AST and count references, bubble up most common, etc?

Could do similar with bash text mangling tools, but language native would probably be best.

I dunno, just a thought in an EOD fog. I don’t own a printer these days, so I guess I’d need an alternative.

Tell computer to observe self and report back.

brailsafe · 3 years ago

First principles! I've totally done this. Especially in a large pub/sub oriented frontend codebase where it's really hard to map out where any given data could have come from

This tool is good for making simple UML diagrams and even lets you do it with simplified syntax

https://plantuml.com/

I'd say the big problem in visualizing big systems is that you can't usefully do it in one graph. For instance I worked on a system that had 2000+ database tables if you were going to make a diagram of that which shows everything it is going to take up a long wall. (This can be useful, but it is a big commitment)

A useful tool is going to let you make meaningful diagrams that show the subset of entities that are part of a story. I went to an art show of Mark Lombardi's works

https://en.wikipedia.org/wiki/Mark_Lombardi

who (before he was murdered) drew elaborate diagrams of conspiracies. One thing they showed was drafts that he made in the progress of creating his visualizations and he would sometimes make 40 or more of them. He would start out with a "hairball" that was disorganized and gradually figure out how to lay the diagram out in a way that made the meaning obvious.

rdubs333 · 3 years ago

Good stuff. I am doing similar things but have not been murdered yet!

mattdeboard · 3 years ago

gl;dd ;)

NicoJuicy · 3 years ago

PlantUML with C4 and the additional cloud icons, is what I use

tra3 · 3 years ago

I had to do a double take, there was a similar question couple of weeks ago [0].

I was blown away by the idea behind C4 when I first saw the presentation. I think what's missing is the tooling. I use C4 PlantUML do document my architecture designs.. what I'd really love though is a google maps style interface where I can zoom in or out of the current level I'm at. That'd be a game changer. Then you can really describe and understand the system.

The original presentation, in fact, used the google maps interface to illustrate the idea where you're first looking at a continent, then you zoom in to the city and finally the street level.

If you are using C4 right now, how do you compose the various level of architecture and navigate around them?

[0] https://news.ycombinator.com/item?id=31370268

chaostheory · 3 years ago

> I'd say the big problem in visualizing big systems is that you can't usefully do it in one graph.

That’s because it’s flat

Imo this changes the game

https://noda.io/

Imagine being able to make a fully 3D graph inside a space the size of a large building

_dain_ · 3 years ago

wikipedia page says suicide

bornfreddy · 3 years ago

We will probably never know if it was a suicide or he was "suicided". Given his works one has to wonder... Did one of them hit too close to the truth?

jaylaal · 3 years ago

PaulHoule · 3 years ago

Weidenwalker · 3 years ago

My friend and I have been working on https://www.codeatlas.dev in our spare time, which is a tool that creates pretty (2D!) visualisations of codebases, while providing additional insights via overlays (e.g. commit density, programming language). For example here's the Kubernetes codebase visualised using codeatlas: https://www.codeatlas.dev/repo/kubernetes/kubernetes.

At the moment, codeatlas is only a static gallery, but we're currently about 1-2 weekends away from releasing a Github action that deploys this diagram on github pages for your own repos - if you're interested, feel free to watch this repo: https://github.com/codeatlasHQ/codebase-visualizer-action

john-tells-all · 3 years ago

Very interesting! I'm convinced humans looking at code plots can see things that computers can't. An extension of your idea is to show how code changes over time. Sections that don't change much = "backbone" of system, probably bug-free. New code, or code that changes a lot = "sketchy", might have bugs. Alternatively, show code colored by "quality" i.e. complexity.

Here's my take: https://github.com/johntellsall/shotglass#demo-flask-a-small...

Huh, I hadn't thought about it that way - you're right, infrequent changes could indeed be a good proxy for stability! (or for "dead-and-forgotten" :D)

Complexity is an interesting measure too - I'm currently not sure how we'd model this, but this could definitely help codeowners understand which parts of their codebase is currently difficult for people to wrap their heads around. Or whether there's any complex parts that there's only a single contributor to, without whom the project would be left with a serious knowledge gap.

Once this can run as part of a CI pipeline and thus lives directly in the repo, I'd also love to add an overlay with the output of the testsuite to see which parts of the codebase aren't covered by tests! Or the output of a profiler, to see which functions are actually called the most.

bloopernova · 3 years ago

That is really sweet, I love it! You've both done a really fantastic job.

Thanks - we're really excited to finally get some feedback on this! :)

InvOfSmallC · 3 years ago

Will it be a paid product?

Hmm - at some point we'll have to think about how fund further development, but the current plan for the github action is for it to be open-source (under a BSL-like license) and free to use!

diegof79 · 3 years ago

What you are looking for is called "Program Understanding". If you Google for it you'll find a bunch of research papers on the topic.

For some reason, tools related to program understanding are not widely adopted by IDEs.

A while ago I used a tool for Java that was based on the Object-Oriented Metrics[0] book by Michele Lanza. But, that tool was discontinued and it doesn't exist anymore[1].

If you are interested in that topic take a look at Moose[2], a dig a little bit in the research papers. (honestly I tried Moose a few times, but I wasn't very comfortable with it).

For TypeScript projects, the TS compiler API is extremely powerful and easy to use. You can use that to extract information and analyze the code relationships (Graphviz is your friend here :) ).

[0]: https://link.springer.com/book/10.1007/3-540-39538-5 [1]: https://web.archive.org/web/20150428173717/http://www.intooi... [2]: https://moosetechnology.org/

marktangotango · 3 years ago

There was also a lot of work in this area in the 90s leading up to the y2k (non)event. Mostly back then it revolved around cobol, which was understandable given y2k impact was in many cases related to legacy cobol systems. A lot of architecture visualization and recovering business requirements from code. I did some work on diagramming mainframe JCL files for example.

jbreckmckye · 3 years ago

The challenge is that there are different ways of "mapping" software.

You could map the way programs fit into machines, and the networks between them. This would be the topology.

You can map the way services call upon one another with requests. This is the service graph.

You can map how systems interact over events or shared resources. You could say this is the logical graph.

The problem happens when you try and graph them all at once. It's the same as trying to draw a real map, with all the services, bus routes, railways, shops and administrative regions superimposed on one image. It's very busy.

So I use separate maps.

Tools are another matter. Personally I use Mermaid for graphs. I also have my own tools that create SVG visualisations using DAGre. This can be helpful for interactive visualisations where you can click into different nodes and explore more detail.

My system uses CloudFormation templates and our in house deployment DSLs to figure out the "topology", then let the users see the different superimposed "graphs" as they see fit

aetherlord · 3 years ago

I'm a fan of https://www.ilograph.com/. I've only used it for a few small things, but the author has good samples, including a diagram of ilograph itself - https://app.ilograph.com/demo.ilograph.Ilograph/Request.

dtjohnnymonkey · 3 years ago

I use Ilograph pretty heavily both for documenting existing systems and designing new systems. The paradigm of “everything has context” makes diagrams much easier to understand.

I have even used it to render infrastructure diagrams of actual production systems (clusters, load balancers, etc)

This looks very cool. It looks similar to the C4 model, where you can have nested components of arbitrary depth ("containers" in C4 parlance).

rswail · 3 years ago

This looks interesting :) Not sure about YAML, been burnt with OpenAPI, but it looks good.

gbuk2013 · 3 years ago

I like using https://c4model.com/ - the Level 2 diagram is particularly useful.

I use https://mermaid-js.github.io/mermaid/#/ for the diagram itself because Github natively supports it in markdown files, so you can revision control the diagram. I managed to get reasonably close to the C4 diagrams minus a few features that mermaid does not support.

neuronexmachina · 3 years ago

There's a recent PR which looks pretty promising for C4 in Mermaid: https://github.com/mermaid-js/mermaid/pull/3038

No automated tool will come close to having a 5 minute conversation with the main designer and having him draw you a diagram freehand on the back of a napkin. This is a social and organizational and communication problem, not technical.

If that can't be done, there are some interesting things you can try. A lot of the suggestions in the thread are "top down" methods; you can get a lot of value out of "bottom up" visualizations too. Things like:

- Histograms of which lines / functions get called the most, or spent the most time in

- Which lines / functions / files get changed the most in the git history

- CPU flamegraphs

- Plain old print-debugging

In over-architected systems it can be difficult to figure out where the real "meat" of the code is, as opposed to the endless layers of configuration and wrappers and interfaces and indirection. UML diagrams may not help, or even be deceiving, but a stack trace never is.

buescher · 3 years ago

The way to capture the content of a conversation like that, so that more than one person at a time can easily benefit from it, is in a theory of operation document with appropriate illustrations. Simple block diagrams will take you far before you need to specify something in the level of detail that the UML supports.

I like the range of approaches in the Architecture of Open Source Applications books: http://aosabook.org/en/index.html