I’m working on improving my software design skills, and it was recommended that I study existing well designed codebases. What are some publicly accessible codebases you would consider gold standards for software design?
Maybe I’m just not good enough at paying attention, but for me it seems like you have to actually run into problems over and over and figure out how to avoid the problems. Then you end up being able to mentally simulate what problems you will run into, and design is basically all about avoiding future problems of various kinds (and balancing tradeoffs about which future problems to avoid and how much effort to put into each, whether you can solve multiple with one design play, etc).
I have this too, I have never been able to "do exercises" or "study a codebase". I need to be making something that I am excited about, then I'll learn, from examples, from wanting to be thorough and correct.
But sometimes I think I'm just not their yet, if I become able to read code like a book and really understand what happens, which often I don't, then perhaps I'll enjoy the process more.
That's how I learn as well. I've found that reading code is a completely different thing from learning. Not even tangentially related.
I can read a codebase and run simulations in my head. But this almost never results in learning anything. I can study code and see how a particular task is done, but I don't learn it until I put it into practice. Just about the only thing I get from reading other people's code is the most egregious ways to not do something.
This is a pretty common thing, a plurality of humans learn this way. We just have different standards for software, for some reason. We like to pretend books teach you something more "real" than just getting your hands dirty and writing some real fucked up code. But the reality is that figuring out why your disaster code doesn't work, and then fixing it is one of the most educational experiences a programmer can have.
> for me it seems like you have to actually run into problems over and over and figure out how to avoid the problems
This shows how immature the field of software engineering is. Imagine bridges or houses were built like that. Or your surgeon was trained like that.
Over time, we hopefully develop estblished norms, but at the moment, things are too much in flux. Put 5 sw engineers in a room, pose a problem and you will get not just 5 different solution proposals, but there will likely be strong disagreements on which approach is a good one.
"I recognize a good solution when I see it" is just not good enough for a serious engineering discipline.
Bridge building is a lot more conservative when it comes to taking risk in the construction, but that is how we build bridges and lots of bridges collapse because of similar causes:
- Design Deficiencies
- Construction Mistakes
- Maintenance Issues
- etc.
An average of 128 bridges collapse annually in the United States. More than 17,000 bridges in America are considered "fracture critical" (vulnerable to collapse from a single impact).
> This shows how immature the field of software engineering is. Imagine bridges or houses were built like that. Or your surgeon was trained like that.
It's not that software engineering is immature, it's just more dynamic.
We are not the surgeon, we write the surgeon. We write a surgeon to fix a broken leg. Once that is done, we don't have to fix another leg. Now we need to reattach a finger. Once that is solved, maybe replace a kidney.
You cannot repetitively train or have strict rules for that, because every time it's something new. You need to have broad knowledge and experience to be able to fight the next unknown challenge. It's unknown because it's never been done before, or it has been done but your competitor will not reveal the details.
Building bridges or being a surgeon sounds very boring to me, since it's always the same (maybe some minor variants). Building software? Very much not the same.
If they could afford experimenting and have a few bridges collapse before they get it right with no significant negative consequences IMHO it wouldn’t be the worst way to learn.
Maybe even more so for surgeons, being able to experiment and fail in a risk free environment seems like a good thing.
> This shows how immature the field of software engineering is. Imagine bridges or houses were built like that.
You're forgetting a key difference between software and the physical world.
The blessing and curse of software is that there are few constraints on the solution so it can take an enormous number of different forms and still be valid. There isn't gravity, friction or the various other things that constrain physical solutions.
And each of those almost limitless solutions has different trade-offs related to a large number of different variables. It takes many many years of seeing the different impacts of those design decisions across the different variables within the constraints of different domains and contexts.
It's truly a combinatorial problem with so many dimensions that no human is smart enough to just simulate the impact of design decisions in new and unfamiliar domains+context.
You could argue that there could be standard ways of doing things within specific domains+context, but the context is so varied between companies, and business priorities have so many differences that I don't think it's a solvable problem.
The best we can do is have higher level patterns and approaches to specific types of problems that can guide people based on others experiences, but all of that still needs to be mixed and matched for the specific problem space.
For me, software design is more comparable to business system and organization design. It’s just that instead of humans executing the processes you design, it’s machines doing it.
I don’t think business system and organization design is approached like bridge design either, is it?
Also, bridges, houses and surgeons can physically kill people if something goes wrong. Software that can physically kill people, such as that in airplanes or missiles, is actually treated quite differently from most software, I think? I don’t have experience in those industries those so I can’t comment on the specifics of how is it different, but my impression is that things are a lot more rigid. Business organization design also can’t directly kill people.
In general, I think that there is a fundamental tension between looseness and flexibility of operations and innovation. If you are super rigorous and have set in stone best practices, it is going to be harder to find new ways of doing things that work better.
I’m not really bothered if people don’t consider software to be a serious engineering discipline. I’m not sure I do either. If someone wants that kind of thing, I’d recommend they go into a different engineering discipline, rather than trying to make software like that.
Tell me you don't know the first thing about engineering without telling me...
This is the engineering process. If you put five engineers of any discipline in a room, you will get five different answers. Every contractor and architect has their own ideas about how things should be done.
Furthermore, we do build houses this way, even in the modern age with building codes. The builder is going to do whatever they can get away with and this is a universal truth. The only reason bridges are held to such high standards is because of the monetary cost of a collapse.
The "norms" are not what you think they are. They're tradition, they're "we've always done it this way". What you're talking about are laws written for public safety. Those laws only exist because we tried and failed to do things a certain way and the cost in money or lives was untenable.
> "I recognize a good solution when I see it" is just not good enough for a serious engineering discipline.
You're conflating engineering with science. A scientist will rigorously test and validate, but a huge part of engineering is just recognizing a good solution. Of course there's a lot of testing and validation in engineering as well, but not with the kind of rigor you're implying.
That's kinda like saying you can learn to drive by just getting into a car, crashing then thinking about how not to crash it next time.
In reality both things are necessary. The car analogy doesn't hold for road driving because we drive well within the limits, but for racing it really is necessary to know exactly where the limits are. I don't think we should really be treating our profession like a race, though.
But if you don't read it's going to be an incredibly long slow process and a lot of car crashes and mangled gearboxes etc. So I say read, read, and read some more. Even if you don't see the point of it right now your experience will later find a place for it and you won't end up descending a hill for the first time not knowing to shift to a lower gear.
>That's kinda like saying you can learn to drive by just getting into a car, crashing then thinking about how not to crash it next time.
That would be a perfectly valid way of learning to drive if crashing had no danger or destruction and you could instantly reset the car every time. Software is a special case of engineering where the cost of failure is extremely low, so trial and error is generally the fastest way to get going with actually doing something.
you mention a key point. profession. programmer doesnt really imply professional programmer.
I'd aay if you do it for a living, certain tedious chores must be learned. the best programmers i know (professional) can all read code. they spent many junior years learning to read it, being on code auditing desk.... nowadays idk how the landscape looks, but for all of them they had to review and read code to find bugs before they were allowed to produce code (they all worked at same company ofc... so my view is limited!)
i do feel such discipline is needed. they can always poke holes on my code no matter how many holes i plug :) - i am semi professional. i write code for work, but not production code. (experimental). i never learned to audit code and feel that makes it impossible for me to truly create production grade code
I totally agree with you. Some of the most intense thinking I've put into writing code has been when I go out for a run in the middle of the day and come back 30 minutes later with solutions to the thing I spent all morning trying to fix.
This is pretty much how I've learned up to this point (and will of course continue). Trying to learn from real world code will be a new experience for me. Not sure how valuable it will be but should be fun either way.
My immediate reaction to this question is: "your team's". Nothing will teach you more about how to design software then really understanding why good and bad solutions were adapted to solve a certain real problem.
Software exists precisely because there is still a messy layer connecting user requirements to actions on a computer. If there was not messiness then we could just automate it all. Approaching software from some sort of Platonic ideal of what software should be will frequently lead to bad decisions on it's own.
When you start to see how certain pressures lead to certain paths you learn to recognize the wrong decisions that are often good at the time, and avoid them. At the same time, you need to learn to develop methods that work quickly and effectively. By far the biggest real challenge in real world software is time constraints. This is almost never discussed in theoretical views of software, but the truth is you're always going to be writing code under pressure to ship. You will come across situations where you do not have time to do what you want to do or think is best.
Good software is software that runs and solves the user need, but you will come to realize that there are design solutions that will make successfully running happen more often. The best way to find these is to study the real software you're writing.
Then it still matters a lot what kind of codebase he is working on. A web project is differently structured and done than the codebase for a embedded operating system. There are differend standards and practices in the industry.
I learned a lot by just stepping through the code with the debugger of libaries I used. That brought more practical insight while learning about design patterns etc. In the end, it is all about patterns. Finding the right pattern for a given problem.
I would be interested in recommendations for those who have a poor example or no examples
In my case, I'm a junior engineer that has recently been given more responsibility designing aspects of our product. I'm just trying to learn all I can so my designs will be good!
I haven't considered time to implement as a metric for evaluating design, but it makes a lot of sense.
I definitely learn so much from my team's codebase. Most of what I learn is either from the good designs I see in there or from my googling trying to fix the not so good parts.
These are several years old at this point, but many open source project leaders contributed to the series "The Architecture of Open Source Applications", which is free to read online: https://aosabook.org/en/index.html
I think there is value in groking entire code bases. It's not just about whether or not they are well designed though. It is an important skill to be able to analyze and see the big picture of how things work together in large systems. For me it often involves drawing diagrams (sometimes UML) to map things out. Being able to view systems in this way is a pre-requisite to intentionally designing your own systems at this level. And yes, once you can work at this level you can learn from good designs, but also see problems with bad ones.
EDIT: and to answer your question, if you're working on something that is "like X but different" then read the source code for X. You could also look at source code for software that you use from day to day: software where you already know what it does. For example, if you're in web, maybe the web framework, or web server, if you write python, maybe a core library that you use, or maybe the python interpreter, or if you use vscode ..., if you use android ..., you get the idea. At the start I would suggest smaller programs, and programs where you already know the domain (e.g. cpython might not be the best place to start if you never implemented an interpreter before, you may spend more time learning about interpreters than the design of this one, still a good thing to learn of course.)
My experience (30 years in software, 25 in practicing architecture, MIT system architecture masters) tells me there is no such thing as abstractly “good” design. There are designs with negative consequences for sure, but “good” depends on the context: what are you building, safety/security requirements etc. Probably most importantly on the implementation team and it’s structure. A team of juniors will butcher your intricate design and Conway’s Law makes your software reflect the team.
Yeah, as I've been learning more about software design, it's become pretty clear there is no silver bullet. It would be nice though. I find it a bit overwhelming to try to find the best solution for a problem when there are so many different architectures and programming paradigms.
That being said, taking into account the requirements does eliminate quite a few of the options. Right now, I work on safety-critical embedded systems which requires us to make some decisions that would most likely be way different in other environments.
It's about having most compromises be Not Totally Stupid and have the worst ones only be locally stupid. There is a big universe out there of compromises that fulfill those criteria. As you imply, a lot of the time it's more about constantly fighting for Less-Bad-and-avoiding-Dumb than about architecture as such.
Top 5 codebases for changing my mind about things:
Wietse Venema's Postfix mail server. Taught me tons about security posture, the architecture i'd describe as microservices before microservices was a thing, but contrary to the modern take on microservices (it's mostly a tool for decomposing work across large semi-isolated groups) this was primarily about security and simplicity.
Spring framework - this opened my eyes to ways of working that i hadn't really thought enough about before, the developers on that project have a culture of deeply considering the needs of their users (who are java developers often in an enterprise environment).
Git - the thing i like about the git code base is that once you've covered the objects database (e.g. blobs, trees and commits) and the implementation of refs, everything else just feels like additional incremental features. With those core concepts, everything else is kinda harmoniously built on top.
Varnish by Poul Henning-Kamp is another one - feels like he went to great lengths to make that code base a teaching tool despite the fact it's also a top tier reverse proxy.
Last one isn't a code base - but it will help with software design in the large; studying how the lieutenants model works in the linux kernel.
Thinking about my answers, i think i've highlighted something subtly different than "well designed codebases" it's more a list of codebases that left a notable long lasting impression on me because of design decisions they made.
But sometimes I think I'm just not their yet, if I become able to read code like a book and really understand what happens, which often I don't, then perhaps I'll enjoy the process more.
I can read a codebase and run simulations in my head. But this almost never results in learning anything. I can study code and see how a particular task is done, but I don't learn it until I put it into practice. Just about the only thing I get from reading other people's code is the most egregious ways to not do something.
This is a pretty common thing, a plurality of humans learn this way. We just have different standards for software, for some reason. We like to pretend books teach you something more "real" than just getting your hands dirty and writing some real fucked up code. But the reality is that figuring out why your disaster code doesn't work, and then fixing it is one of the most educational experiences a programmer can have.
This shows how immature the field of software engineering is. Imagine bridges or houses were built like that. Or your surgeon was trained like that.
Over time, we hopefully develop estblished norms, but at the moment, things are too much in flux. Put 5 sw engineers in a room, pose a problem and you will get not just 5 different solution proposals, but there will likely be strong disagreements on which approach is a good one.
"I recognize a good solution when I see it" is just not good enough for a serious engineering discipline.
While I don't disagree with you in general, this does feel a bit off.
By that logic you can call the field of music immature, and all of the arts. I think the difference is that its easy to experiment without high costs.
I genuinely think that if building bridges was cheap and quick, the fastest way to learn was to try...
Bridge building is a lot more conservative when it comes to taking risk in the construction, but that is how we build bridges and lots of bridges collapse because of similar causes:
An average of 128 bridges collapse annually in the United States. More than 17,000 bridges in America are considered "fracture critical" (vulnerable to collapse from a single impact).It's not that software engineering is immature, it's just more dynamic.
We are not the surgeon, we write the surgeon. We write a surgeon to fix a broken leg. Once that is done, we don't have to fix another leg. Now we need to reattach a finger. Once that is solved, maybe replace a kidney.
You cannot repetitively train or have strict rules for that, because every time it's something new. You need to have broad knowledge and experience to be able to fight the next unknown challenge. It's unknown because it's never been done before, or it has been done but your competitor will not reveal the details.
Building bridges or being a surgeon sounds very boring to me, since it's always the same (maybe some minor variants). Building software? Very much not the same.
If they could afford experimenting and have a few bridges collapse before they get it right with no significant negative consequences IMHO it wouldn’t be the worst way to learn.
Maybe even more so for surgeons, being able to experiment and fail in a risk free environment seems like a good thing.
You're forgetting a key difference between software and the physical world.
The blessing and curse of software is that there are few constraints on the solution so it can take an enormous number of different forms and still be valid. There isn't gravity, friction or the various other things that constrain physical solutions.
And each of those almost limitless solutions has different trade-offs related to a large number of different variables. It takes many many years of seeing the different impacts of those design decisions across the different variables within the constraints of different domains and contexts.
It's truly a combinatorial problem with so many dimensions that no human is smart enough to just simulate the impact of design decisions in new and unfamiliar domains+context.
You could argue that there could be standard ways of doing things within specific domains+context, but the context is so varied between companies, and business priorities have so many differences that I don't think it's a solvable problem.
The best we can do is have higher level patterns and approaches to specific types of problems that can guide people based on others experiences, but all of that still needs to be mixed and matched for the specific problem space.
I don’t think business system and organization design is approached like bridge design either, is it?
Also, bridges, houses and surgeons can physically kill people if something goes wrong. Software that can physically kill people, such as that in airplanes or missiles, is actually treated quite differently from most software, I think? I don’t have experience in those industries those so I can’t comment on the specifics of how is it different, but my impression is that things are a lot more rigid. Business organization design also can’t directly kill people.
In general, I think that there is a fundamental tension between looseness and flexibility of operations and innovation. If you are super rigorous and have set in stone best practices, it is going to be harder to find new ways of doing things that work better.
I’m not really bothered if people don’t consider software to be a serious engineering discipline. I’m not sure I do either. If someone wants that kind of thing, I’d recommend they go into a different engineering discipline, rather than trying to make software like that.
This is the engineering process. If you put five engineers of any discipline in a room, you will get five different answers. Every contractor and architect has their own ideas about how things should be done.
Furthermore, we do build houses this way, even in the modern age with building codes. The builder is going to do whatever they can get away with and this is a universal truth. The only reason bridges are held to such high standards is because of the monetary cost of a collapse.
The "norms" are not what you think they are. They're tradition, they're "we've always done it this way". What you're talking about are laws written for public safety. Those laws only exist because we tried and failed to do things a certain way and the cost in money or lives was untenable.
> "I recognize a good solution when I see it" is just not good enough for a serious engineering discipline.
You're conflating engineering with science. A scientist will rigorously test and validate, but a huge part of engineering is just recognizing a good solution. Of course there's a lot of testing and validation in engineering as well, but not with the kind of rigor you're implying.
In reality both things are necessary. The car analogy doesn't hold for road driving because we drive well within the limits, but for racing it really is necessary to know exactly where the limits are. I don't think we should really be treating our profession like a race, though.
But if you don't read it's going to be an incredibly long slow process and a lot of car crashes and mangled gearboxes etc. So I say read, read, and read some more. Even if you don't see the point of it right now your experience will later find a place for it and you won't end up descending a hill for the first time not knowing to shift to a lower gear.
That would be a perfectly valid way of learning to drive if crashing had no danger or destruction and you could instantly reset the car every time. Software is a special case of engineering where the cost of failure is extremely low, so trial and error is generally the fastest way to get going with actually doing something.
I'd aay if you do it for a living, certain tedious chores must be learned. the best programmers i know (professional) can all read code. they spent many junior years learning to read it, being on code auditing desk.... nowadays idk how the landscape looks, but for all of them they had to review and read code to find bugs before they were allowed to produce code (they all worked at same company ofc... so my view is limited!)
i do feel such discipline is needed. they can always poke holes on my code no matter how many holes i plug :) - i am semi professional. i write code for work, but not production code. (experimental). i never learned to audit code and feel that makes it impossible for me to truly create production grade code
Software exists precisely because there is still a messy layer connecting user requirements to actions on a computer. If there was not messiness then we could just automate it all. Approaching software from some sort of Platonic ideal of what software should be will frequently lead to bad decisions on it's own.
When you start to see how certain pressures lead to certain paths you learn to recognize the wrong decisions that are often good at the time, and avoid them. At the same time, you need to learn to develop methods that work quickly and effectively. By far the biggest real challenge in real world software is time constraints. This is almost never discussed in theoretical views of software, but the truth is you're always going to be writing code under pressure to ship. You will come across situations where you do not have time to do what you want to do or think is best.
Good software is software that runs and solves the user need, but you will come to realize that there are design solutions that will make successfully running happen more often. The best way to find these is to study the real software you're writing.
What if the question is asked by a college student?
I learned a lot by just stepping through the code with the debugger of libaries I used. That brought more practical insight while learning about design patterns etc. In the end, it is all about patterns. Finding the right pattern for a given problem.
In my case, I'm a junior engineer that has recently been given more responsibility designing aspects of our product. I'm just trying to learn all I can so my designs will be good!
I definitely learn so much from my team's codebase. Most of what I learn is either from the good designs I see in there or from my googling trying to fix the not so good parts.
* https://news.ycombinator.com/item?id=36370684
* https://news.ycombinator.com/item?id=30752540
* https://news.ycombinator.com/item?id=9896369 (Python specific)
I think there was another one with a similar name but I can't think of it's name.
0: https://www.spinellis.gr/codereading/, check the TOC https://www.spinellis.gr/codereading/toc.html
EDIT: and to answer your question, if you're working on something that is "like X but different" then read the source code for X. You could also look at source code for software that you use from day to day: software where you already know what it does. For example, if you're in web, maybe the web framework, or web server, if you write python, maybe a core library that you use, or maybe the python interpreter, or if you use vscode ..., if you use android ..., you get the idea. At the start I would suggest smaller programs, and programs where you already know the domain (e.g. cpython might not be the best place to start if you never implemented an interpreter before, you may spend more time learning about interpreters than the design of this one, still a good thing to learn of course.)
That being said, taking into account the requirements does eliminate quite a few of the options. Right now, I work on safety-critical embedded systems which requires us to make some decisions that would most likely be way different in other environments.
Top 5 codebases for changing my mind about things:
Wietse Venema's Postfix mail server. Taught me tons about security posture, the architecture i'd describe as microservices before microservices was a thing, but contrary to the modern take on microservices (it's mostly a tool for decomposing work across large semi-isolated groups) this was primarily about security and simplicity.
Spring framework - this opened my eyes to ways of working that i hadn't really thought enough about before, the developers on that project have a culture of deeply considering the needs of their users (who are java developers often in an enterprise environment).
Git - the thing i like about the git code base is that once you've covered the objects database (e.g. blobs, trees and commits) and the implementation of refs, everything else just feels like additional incremental features. With those core concepts, everything else is kinda harmoniously built on top.
Varnish by Poul Henning-Kamp is another one - feels like he went to great lengths to make that code base a teaching tool despite the fact it's also a top tier reverse proxy.
Last one isn't a code base - but it will help with software design in the large; studying how the lieutenants model works in the linux kernel.
Thinking about my answers, i think i've highlighted something subtly different than "well designed codebases" it's more a list of codebases that left a notable long lasting impression on me because of design decisions they made.