I have some paradoxical feelings about "blameless" retro culture that I'll try to sum up.
In general, I'm in favor of the approach. I don't think singling people out and bullying or shaming them for their mistakes ever works. I think most well-intentioned engineers will already beat themselves up plenty for making a serious mistake, and they don't need any encouragement to do so. I know I do.
On the other hand, there is a red line. At a place I worked, a DBA was let go after he repeatedly brought production down for 45 minutes to an hour at a time by running intensive queries of his own design for data-gathering, in some cases, after being explicitly told not to do that against the prod database. This was a person whose job description required him to have access to prod.
There were process problems, maybe - being allowed to run whatever queries you want on production under your own authority, sure - but his cavalier attitude towards a production environment was still unacceptable. Process can only help when people are well-intentioned and doing their best; if people are malicious or negligent or just not good at their jobs, adding more process to get around that only makes things worse.
I think there should be a difference between a postmortem process and a performance management process and just because the first is blameless doesn’t mean that the second can’t look back to find problems or negligence.
That said, even when there is obvious negligence, having the postmortem process look at the issue with blamelessness is important to build up tooling/changes that could prevent it from happening again. For example, maybe you could revoke individuals having direct access to the production database without multi-party authentication.
>I think there should be a difference between a postmortem process and a performance management process and just because the first is blameless doesn’t mean that the second can’t look back to find problems or negligence.
That doesn't make sense. The moment that you look back at a postmortem for use in penalizing someone via performance management, the postmortem is no longer blameless.
It seems like a read replica would have helped out in this instance.
I agree if somebody decides to keep doing the same actions after being told not do to them, because their actions would bring down production, and their actions do bring down production, then they should be held accountable.
> At a place I worked, a DBA was let go after he repeatedly brought production down for 45 minutes to an hour at a time by running intensive queries of his own design for data-gathering, in some cases, after being explicitly told not to do that against the prod database. This was a person whose job description required him to have access to prod.
Trying to have some sympathy: Was he given an alternative? Or was it a "stop doing that important thing -- I don't know how else to do it, figure it out" situation?
It wasn't particularly important and we had "offline" copies of most of the DB data for this sort of thing, just somewhat less up to date. I honestly don't know why he did this.
I think maybe it's an attempt to buzzwordify a culture of not holding honest mistakes against people, and pretend it's a discrete separable "thing we do" rather than a pervasively intertwingled aspect of "what we're like here".
The question isn't whether it's your fault, it's whether you take responsibility for it. If no one takes responsibility for anything then you get nowhere. And if you take responsibility for it, it's your fault if it goes wrong.
> "Hero Programmer" is a derogatory name for a programmer who chooses to fix problems in epic, caffeine-fueled 36-hour coding sessions that frequently just kick the can down the road to the next heroic 36-hour coding blitz. Hero programmers would rather react than plan. Projects with hero programmers working on them often make a lot of progress initially, but never arrive at a stable state of completion
Maybe there are workplaces where people get together to collaborate on a design and then break the design down into tasks and assign those tasks to programmers to implement. Maybe this process is performed until the project is done. Maybe. But I've never seen it. I see people taking responsibility for small and large tasks, and the large ones sometimes involve a single person re-implementing entire systems spread across thousands of files (though not necessarily in "36-hour coding blitzes").
Honestly that whole hero programmer bit seems like a bit of a strawman. What's being described sounds like a talented but inexperienced developer (which doesn't necessarily mean they are young or fresh to the field; some people manage to stay beginners for decades). Doesn't mean there aren't highly talented developers that can get a lot of work done in a short amount of time if you let them.
The failure in that case is not having a more senior developer mentor the kid.
This is not true. We and I had finished tons of projects. They are done in the true sense. We are OK with their state and they rarely ever change. They work.
The way to attribute accountability is protocols. Create and maintain protocols that are known ways to do something safely. If you broke something by disregarding the protocol, then you've fucked up. If you broke something by complying with the protocol, then you discovered the protocol needs to be updated.
This may be true, but when things break people will look for a scapegoat. So when things break, and you are mostly-responsible for initiating the failure, use collective language ("we" and not "I"), frame the failure as a systems failure when you are talking to management or the executives, look cool even if you are feeling stressed out. Manage the narrative! Sure, you flipped the switch or whatever, but try and survive the event. Just because you think its a system's failure doesn't mean other people share this belief, don't volunteer to be thrown off the bus.
In my experience, working at a "classic" Japanese engineering firm, scapegoating was discouraged.
During postmortems, we would often decide something like "Chris made an erroneous assumption that the fix introduced no bugs." (That's a classic "oldtimer" mistake, BTW. I make it all the time -I'm a slow learner).
Absolutely no blame would be affixed. It was really important for Chris (that's me) to assume Responsibility for the error, and the team would develop a solution.
This being a Japanese company, of course, said "solution" usually ended up being another punchlist item, like "Perform complete regression tests for even the smallest bug fix release," etc.
I'm not thrilled with people using "hero programmer syndrome," or "bus factor" as an excuse to write naive or deliberately dumbed-down code, though.
Sometimes, a program needs to be maintained by skilled, experienced, well-paid, and motivated people. If a company insists on developing code, using advanced techniques, then turning over maintenance to junior staff, or do a bad job, writing a program, because they want it to be maintained by the absolute cheapest programmers possible, that's a problem.
I feel like, for me at least, being able to say "That was my fault" makes me remember what I did. It makes me document that failure. And makes me tell others to avoid what I did.
I've been training a JR lately and I will always say "Here's all the things I did wrong when I did this, so don't do this things"
If I did it, he could easily do it, and if we can all avoid my mistakes, so much the better.
It WAS my fault quite a few times, and I'm ok with that. And luckily, everyone else around here is too. I'd hate to work somewhere that punishes honest mistakes. (there are limits, of course)
I've only been a full time developer for a little while now, but I've absolutely gotten some good lessons out of these mistakes.
I'll tell you what, I always make sure to keep the updated_at column unchanged when I'm doing a manual update query. I also always double check and make sure I run DB changes before merging code, and I do a double check on all my code changes in a PR before I merge it in case I left in something behind.
All of these are rooted in screw ups on my end that my company was very understanding of and made me grow as an engineer. Would it have been better to never have made these mistakes at all? Yes, but that's probably unrealistic. I've seen the hires after me make mistakes and I made sure to let them know of my first prod bug too, the way my boss did for me.
It being your fault is a good way to go through life.
It focuses your mind on what you could do to avoid those situations in the first place.
If it’s your fault prod broke you can fix the process or you can look for a new job where the process is already fixed. Or you can find a role that doesn’t involve pushing code to prod, maybe in R&D, etc.
It being your fault in the sense of "you know you made a mistake and you're committed to remedying that mistake" is fine. What we don't need is "you know you made a mistake and now you must endure abuse and have your job threatened over a mistake."
Yeah shitty managers exist. I was a lead when one of the engineers shipped a debug build that made it past App Store review. (Our debug builds were obvious). My manager says Mike (name changed) isn’t cutting releases anymore.
I say Mike is cutting releases because he’s now the one person I trust on the team to not fuck it up.
If you need it in writing so you can fire me if mike fucks it up, let me know.
Manager mike and I all cut the next release at Mikes workstation with him knowing my ass was on the line if we shipped another debug build.
If you mean hold yourself responsible for doing the best you can and learning from mistakes, then I fully agree.
The issue is fault can also mean carrying guilt with you and continuing to be blamed for it. This is not helpful once you’ve learned the lessons you needed.
> When you look at things from this lens, all the successes of a website, an application, or an organization flow from the talents and genius of a few individuals. It’s a compelling outlook because, well, empirically it can definitely appear this way, and it’s naturally aligned with the other dominant societal ideas we have about individuality.
In many large organizations, much of the success comes from the foresight, insight, and hard "work" comes from a few benefitting many. The reality is, it is individuals and not some collective group or "teams".
This is the default approach taken by the airline industry, just culture:
"Just culture is a concept related to systems thinking which emphasizes that mistakes are generally a product of faulty organizational cultures, rather than solely brought about by the person or persons directly involved."
In general, I'm in favor of the approach. I don't think singling people out and bullying or shaming them for their mistakes ever works. I think most well-intentioned engineers will already beat themselves up plenty for making a serious mistake, and they don't need any encouragement to do so. I know I do.
On the other hand, there is a red line. At a place I worked, a DBA was let go after he repeatedly brought production down for 45 minutes to an hour at a time by running intensive queries of his own design for data-gathering, in some cases, after being explicitly told not to do that against the prod database. This was a person whose job description required him to have access to prod.
There were process problems, maybe - being allowed to run whatever queries you want on production under your own authority, sure - but his cavalier attitude towards a production environment was still unacceptable. Process can only help when people are well-intentioned and doing their best; if people are malicious or negligent or just not good at their jobs, adding more process to get around that only makes things worse.
That said, even when there is obvious negligence, having the postmortem process look at the issue with blamelessness is important to build up tooling/changes that could prevent it from happening again. For example, maybe you could revoke individuals having direct access to the production database without multi-party authentication.
That doesn't make sense. The moment that you look back at a postmortem for use in penalizing someone via performance management, the postmortem is no longer blameless.
Surely the first occurrence led to a post-mortem which documented and forbed the practices that became known to be dangerous for production.
I agree if somebody decides to keep doing the same actions after being told not do to them, because their actions would bring down production, and their actions do bring down production, then they should be held accountable.
That's why there is a hiring and firing process.
Trying to have some sympathy: Was he given an alternative? Or was it a "stop doing that important thing -- I don't know how else to do it, figure it out" situation?
> "Hero Programmer" is a derogatory name for a programmer who chooses to fix problems in epic, caffeine-fueled 36-hour coding sessions that frequently just kick the can down the road to the next heroic 36-hour coding blitz. Hero programmers would rather react than plan. Projects with hero programmers working on them often make a lot of progress initially, but never arrive at a stable state of completion
Maybe there are workplaces where people get together to collaborate on a design and then break the design down into tasks and assign those tasks to programmers to implement. Maybe this process is performed until the project is done. Maybe. But I've never seen it. I see people taking responsibility for small and large tasks, and the large ones sometimes involve a single person re-implementing entire systems spread across thousands of files (though not necessarily in "36-hour coding blitzes").
The failure in that case is not having a more senior developer mentor the kid.
> Projects with hero programmers working on them often make a lot of progress initially, but never arrive at a stable state of completion
No project is ever really finished except the ones nobody cares about. Probably because they stopped being maintained by the hero programmer.
We moved on other projects.
During postmortems, we would often decide something like "Chris made an erroneous assumption that the fix introduced no bugs." (That's a classic "oldtimer" mistake, BTW. I make it all the time -I'm a slow learner).
Absolutely no blame would be affixed. It was really important for Chris (that's me) to assume Responsibility for the error, and the team would develop a solution.
This being a Japanese company, of course, said "solution" usually ended up being another punchlist item, like "Perform complete regression tests for even the smallest bug fix release," etc.
I'm not thrilled with people using "hero programmer syndrome," or "bus factor" as an excuse to write naive or deliberately dumbed-down code, though.
Sometimes, a program needs to be maintained by skilled, experienced, well-paid, and motivated people. If a company insists on developing code, using advanced techniques, then turning over maintenance to junior staff, or do a bad job, writing a program, because they want it to be maintained by the absolute cheapest programmers possible, that's a problem.
I've been training a JR lately and I will always say "Here's all the things I did wrong when I did this, so don't do this things"
If I did it, he could easily do it, and if we can all avoid my mistakes, so much the better.
It WAS my fault quite a few times, and I'm ok with that. And luckily, everyone else around here is too. I'd hate to work somewhere that punishes honest mistakes. (there are limits, of course)
I'll tell you what, I always make sure to keep the updated_at column unchanged when I'm doing a manual update query. I also always double check and make sure I run DB changes before merging code, and I do a double check on all my code changes in a PR before I merge it in case I left in something behind.
All of these are rooted in screw ups on my end that my company was very understanding of and made me grow as an engineer. Would it have been better to never have made these mistakes at all? Yes, but that's probably unrealistic. I've seen the hires after me make mistakes and I made sure to let them know of my first prod bug too, the way my boss did for me.
It focuses your mind on what you could do to avoid those situations in the first place.
If it’s your fault prod broke you can fix the process or you can look for a new job where the process is already fixed. Or you can find a role that doesn’t involve pushing code to prod, maybe in R&D, etc.
I say Mike is cutting releases because he’s now the one person I trust on the team to not fuck it up.
If you need it in writing so you can fire me if mike fucks it up, let me know.
Manager mike and I all cut the next release at Mikes workstation with him knowing my ass was on the line if we shipped another debug build.
Mike never shipped another debug build.
If you mean hold yourself responsible for doing the best you can and learning from mistakes, then I fully agree.
The issue is fault can also mean carrying guilt with you and continuing to be blamed for it. This is not helpful once you’ve learned the lessons you needed.
In many large organizations, much of the success comes from the foresight, insight, and hard "work" comes from a few benefitting many. The reality is, it is individuals and not some collective group or "teams".
"Just culture is a concept related to systems thinking which emphasizes that mistakes are generally a product of faulty organizational cultures, rather than solely brought about by the person or persons directly involved."
https://en.wikipedia.org/wiki/Just_culture