Untapped Way to Learn a Codebase: Build a Visualizer

socketcluster · 23 days ago

The problem with most people's code is that it's full of unnecessary complexity and creates a ton of work. I swear at least 90% of projects from 'top' companies, by 'top' engineers is full of unnecessary complexity which slows everything down significantly. They literally need a team of 20+ engineers to do the work which could have been done more effectively with 1 good engineer.

Based on modern metrics for code quality, almost nobody will realize that they're looking at bad code. I've seen a lot of horrible codebases which looked pretty good superficially; good linting, consistent naming, functional programming, static typing, etc... But architecturally, it's just shockingly bad; it's designed such that you need to refactor the code constantly; there is no clear business layer; business logic traverses all components including all the supposedly generic ones.

With bad code, any business requirement change requires a deep refactoring... And people will be like "so glad we use TypeScript so that I don't accidentally forget to update a reference across 20 different files required as part of this refactoring" - Newsflash: Your tiny business requirement change requires you to update 20+ files because your code sucks! Sure TypeScript helps in this case, but type safety should be the least of your concerns. If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

There's a reason we say "Leaky abstraction" - If a complex abstraction leaks through many file boundaries; it's an abstraction and it's leaky!

lpedrosa · 22 days ago

I fully agree with your sentiment, and it also drives me crazy sometimes.

I wonder if the main problem was all the min maxing interview patterns that rewarded algorithm problem solvers back in the 2010's onwards.

People applied for software engineering jobs because they wanted to play with tech, not because they wanted to solve product problems (which should have a direct correlation with revenue impact)

Then you have the ego boosting blog post era, where everyone wanted to explain how they used Kafka and DDD and functional programming to solve a problem. If you start reading some of those posts, you'll understand that the actual underlying problem was actually not well understood (especially the big picture).

This led the developer down a wild goose chase (willingly), where they end up spending tons of time burning through engineering time, which arguably could be better spent in understanding the domain.

This is not the case for everyone, but the examples are few.

It makes me wonder if the incentives are misaligned, and engineering contributing to revenue ends up not translating to hard cash, promos and bonuses.

In this new AI era, you can see the craftsman style devs going full luddite mode, IMO due to what I've mentioned above. As a craftsman style dev myself. I can only set up the same async job queue pattern that many times. I'm actually enjoying the rubber ducking with the AI more and more. Mostly for digging into the domain and potential approaches for simplification (or even product refinement).

socketcluster · 16 days ago

It's infuriating to think about interviews as a likely cause for this complexity bloat because I made so many comments online about this exact problem with big tech interview processes and people would usually acknowledge the problem but no company ever fixed it! The only people who didn't think there was a problem, unironically, were those who were very good at fast puzzle-solving.

Painful for me because I excel at architecture. My puzzle-solving skills are actually good too, but unfortunately, not under time constraints! Sometimes I feel like there's been an industry-wide conspiracy against the software architect archetype!

I remember since I first learned coding at a young age, I wanted to be a software architect and I was shocked to learn that this skill was rarely appreciated in the industry. I became convinced that the software developer role had become a kind of 'bullshit job' of sorts to meet the needs of the reserve bank's job-creation agenda.

I suppose the silver lining is that at least now LLMs have a bias towards puzzle-solving and so lead most codebases astray... This increases my value as a software architect or 'craftsman' in your words.

I think you make a good argument there. You can extrapolate it to almost every aspect of society. Since you go to school, everything has been geared towards measuring thinking speed... We've been using thinking speed as the definition of intelligence... You know who else besides high IQ individuals are good at thinking fast? LLMs!

It's kind of interesting and fitting though that the AI agents we invented have the same biases as the humans at the top of our organizations!

I feel like the whole "there is only one kind of intelligence" belief which was pervasive in big tech has been thoroughly debunked by now.

judahmeek · 22 days ago

> If code is well architected, complex abstractions don't generally end up stretching across more than one or two files.

This is a naive metric since it's satisfied by putting the entire code base into a single file.

Part of the reason that business requirement changes to modern web dev code bases require changes to so many files is because web devs are encouraged to restrict the scope of any one file as much as possible.

I can't tell if you're complaining about that specifically or if you think it's possible to have both a highly modularized code base & still restrict business requirement changes to only a couple files.

If the latter, then I'd love to know guidelines for doing so.

wackget · 22 days ago

You just described literally all modern web development.

socketcluster · 22 days ago

Almost all, yes.

I said 90% in my comment but that's from my professional experience which is probably biased towards complex projects where maintainability is more important.

tclancy · 23 days ago

This is an interesting approach. I think, in a way, it mirrors what I do. Having contracted for much of my career, I’ve had to get up to speed on a number of codebases quickly. When I have a choice of how to do this, I find a recently closed issue and try to write a unit test for it. If nothing else, you learn where the tests live, assuming they exist, and how much of a safety net you have if you start hacking away at things. Once I know how to add tests and run them (which is a really good way to deal with the codebase setup problem mentioned in the article because a lot of onboarding docs only get you to the codebase running without all the plumbing you need), I feel like I can get by without a full understanding of the code as I can throw in a couple of tests to prove what I want to get to and then hope the tests or CI or hooks prevent me from doing A Bad Thing. Not perfect and it varies depending on on how well the project is built and maintained, but if I can break things easily, people are probably used to things breaking and then I have an avenue to my first meaningful contribution. Making things break less.

its-kostya · 23 days ago

I am quite skeptical and reserved when it comes to AI, particularly as it relates to impacts of the next generation of engineers. But using AI to learn a code base has been life-changing. Using a crutch to feel your way around. Then ditching the crutch when things are familiar, like using a map until you learn the road yourself.

RealityVoid · 21 days ago

Super useful, indeed. My only fear is that at times it can lead to superficial understanding. You don't get the satisfying click of all pieces, just a surface level understanding. I find that once AI gives me the lay of the land I still need to deep dive myself, but I can take shortcuts I would have never taken and it feels live traversing the scenery with a map. Pretty nifty!

ambicapter · 21 days ago

I think it is possible to use AI in a way that ends up with understanding and very easy to use it in a way where nothing at all sticks. Vibe coders by definition know or understand 0% of their codebase but you can use AI in a more questioning manner where you can get answers that are testable, test them immediately, and add the correct answers to the context immediately while embedding a clearer picture in your mental model.

patrickdavey · 22 days ago

I'm about to start a new role. What have you found most effective in using it to learn a new code base? Just asking questions about "what is this class doing" ? drawing architecture diagrams?

its-kostya · 22 days ago

Just ask it what naturally draws your curiosity and use it to build your mental model. I may add that our company got us enterprise subscription (so models aren't trained on our IP) so I can just point it at the entire codebase, rather than copying/pasting snippets into a chat window.

What does this program accomplish? How does it accomplish it? Walk me through the boot sequence. Where does it do ABC?

I work in a company where I frequently interact with adjacent teams' code bases. When working on a ticket that touches another system, I'll typically tell it what I'm working on and ask it to point me to areas in the code that are responsible for that capability and which tests exercise that code. This is a great head start for me. I then start "in the ball park".

I would not recommend to have it make diagrams for you. I don't know what it is but they LLMs just aren't great at coveting information into diagram form. I've had it explain, quite impressively, parts of code and when I ask it to turn that into a diagram it comes up short. Must be low on training data expressing itself in that medium. It's an okay way to get the syntax for a diagram started, however.

I wish you an auspicious time in your new role!

catapart · 23 days ago

Your visualizer looks great! I really like that it queues up tasks to run instead of only operating on the code during runtime attachment. I haven't seen that kind of thing before.

I built my own node graph utility to do this for my code, after using Unreal's blueprints for the first time. Once it clicked for me that the two are different views of the same codebase, I was in love. It's so much easier for me to reason about node graphs, and so much easier for me to write code as plain text (with an IDE/language server). I share your wish that there were a more general utility for it, so I could use it for languages other than js/ts.

Anyway, great job on this!

hyperific · 23 days ago

GitHub Next comes to mind

https://githubnext.com/projects/repo-visualization/

esafak · 23 days ago

Not very useful, is it?

criddell · 23 days ago

Is this similar to what you can get with Doxygen?

https://en.wikipedia.org/wiki/Doxygen#/media/File:Doxygen-1....

vasvir · 22 days ago

That would be my question too...

Charon77 · 23 days ago

In reverse engineering we often use Graph View to see execution flow as well. Glad to see it being used elsewhere

touristtam · 23 days ago

Do you automate that? If so what tooling do you use?

Pay08 · 23 days ago

IDA does it by default, for example.

Quiark · 23 days ago

Do you guys remember the smalltalk toolkit posted here a while ago which their creators made specifically for help understanding new codebases?

xkriva11 · 23 days ago

https://gtoolkit.com/ or https://moosetechnology.org/

bokchoi · 23 days ago

Woah, that Glamorous Toolkit environment looks amazing. Thanks for the pointer.