If you are interested in these ideas, you should know that this essay kicks off a series of essays that culminates, a year later, with an examination of the Amazon-style Weekly Business Review:
(It took that long because of a) an NDA, and b) it takes time to put the ideas to practice and understand them, and then teach them to other business operators!)
The ideas presented in this particular essay are really attributed to W. Edwards Deming, Donald Wheeler, and Brian Joiner (who created Minitab; ‘Joiner’s Rule’, the variant of Goodhart’s Law that is cited in the link above is attributed to him)
Most of these ideas were developed in manufacturing, in the post WW2 period. The Amazon-style WBR merely adapts them for the tech industry.
I hope you will enjoy these essays — and better yet, put them to practice. Multiple executives have told me the series of posts have completely changed the way they see and run their businesses.
FYI, you can also upvote or favorite a comment, and then view those upvoted/favorited comments from your profile (same for submissions). Favorites are public.
This doesn't really touch on the core of the issue, which is business expectations that don't match up with reality.
Business leaders like to project success and promise growth that there is no evidence they will or can achieve, and then put it on workers to deliver that, and when there's no way to achieve the outcome other than to cheat the numbers, the workers will (and will have to).
At some point businesses stopped treating outperforming the previous year's quarter as over-delivering, and made it an expectation, regardless of what is actually doable.
The article actually addresses this directly through Wheeler's distinction between "Voice of the Customer" (arbitrary targets/expectations) and "Voice of the Process" (what's actually achievable). The key insight is that focusing solely on hitting targets without understanding the underlying process capabilities leads to gaming metrics. Amazon's WBR process shows how to do this right - they focus primarily on controllable input metrics rather than output targets, and are willing to revise both metrics and targets based on what the process data reveals is actually possible. The problem isn't having targets - it's failing to reconcile those targets with process reality.
I think the problem is dimensionality. Business leaders naturally work in low dimensional space - essentially 1D increase NPV. However, understanding how this translates to high dimensional concrete action is what separates bad business leaders from good ones.
> Let’s demonstrate this by example. Say that you’re working in a widget factory, and management has decided you’re supposed to produce 10,000 widgets per month...
It then discusses ways that the factory might cheat to get higher numbers.
But it doesn't even mention what I suspect the most likely outcome is: they achieve the target by sacrificing something else that isn't measured, such as quality of the product (perhaps by shipping defective widgets that should have been discarded, or working faster which results in more defects, or cutting out parts of the process, etc.), or safety of the workers, or making the workers work longer hours, etc.
Just a side note that this usage isn't really the application Goodhart had in mind. Suppose you're running a central bank and you see a variable that can be used to predict inflation. If you're doing your job as a central banker optimally, you'll prevent inflation whenever that variable moves, and then no matter what happens to the variable, due to central bank policy, inflation is always at the target plus some random quantity and the predictive power disappears.
As "Goodhart's law" is used here, in contrast, the focus is on side effects of a policy. The goal in this situation is not to make the target useless, as it is if you're doing central bank policy correctly.
I can confirm this. We've standardized Goodhart's law creating a 90-day rotation requirement for KPIs. We found that managers would reuse the same performance indicators with minor variations and put them on sticky notes to make them easier to target.
If your managers are doing that it's a strong signal your KPIs are a distraction and your managers are acting rationally within the system they're been placed.
They need something they can check easily so the team can get back to work. It's hard to find metrics that are both meaningful to the business and track with the work being asked of the team.
Do you have enough KPIs that you can be sure that these targets also serve as useful metrics for the org as a whole? Do you randomize the assignment every quarter?
As I talk through this ... have you considered keeping some "hidden KPIs"?
I'm riffing on password rotation requirements and the meta-nature of trying to make Goodhart's law a target. I could've been a bit more obviously sarcastic.
Goodhart's law is often misunderstood and the author here seems to agree and disagree. Goodhart's law is about alignment. That every measure is a proxy for the thing you are actually after and that it doesn't perfectly align.
Here's the thing, there's no fixing Goodhart's Law. You just can't measure anything directly, even measuring with a ruler is a proxy for a meter without infinite precision. This gets much harder as the environment changes under you and metrics' utility changes with time.
That said, much of the advice is good: making it hard to hack and giving people flexibility. It's a bit obvious that flexibility is needed if you're interpreting Goodhart's as "every measure is a proxy", "no measure is perfectly aligned", or "every measure can be hacked"
I want to block some time to grok the WBR and XMR charts that Cedric is passionate about (for good reason).
I might be wrong but I feel like WBR treats variation (looking at the measure and saying "it has changed") as a trigger point for investigation rather than conclusion.
In that case, lets say you do something silly and measure lines of code committed. Lets also say you told everyone and it will factor into a perforance review and the company is know for stack ranking.
You introduce the LOC measure. All employees watch it like a hawk. While working they add useless blocks of code an so on.
LOC commited goes up and looks significant on XMR.
These ideas come from statistical process control, which is a perspective that acknowledges two things:
(a) All processes have some natural variation, and for as long as outputs fall in the range of natural process variation, we are looking at the same process.
(b) Some processes apparently exhibit outputs outside of their natural variation. when this has happened something specific has occurred, and it is worth trying to find out what.
In the second case, there are many possible reasons for exceptional outputs:
- Measurement error,
- Failure of the process,
- Two interleaved processes masquerade as one,
- A process improvement has permanently shifted the level of the output,
- etc.
SPC tells us that we should not waste effort on investigating natural variation, and should not make blind assumptions about exceptional variation.
It says outliers are the most valuable signals we have, because they tell us we are not only looking at what we thought we were, but something ... else also.
> I immediately glommed onto this list as a more useful formulation than Goodhart’s Law. Joiner’s list suggests a number of solutions:
> Make it difficult to distort the system.
> Make it difficult to distort the data, and
If companies knew how to make it difficult to distort the system/data, don't you think they would have done it already? This feels like telling a person learning a new language that they should try to sound more fluent.
The article goes into (what I consider) actionable methods. Specifically:
* Create a finance department that's independent in both their reporting and ability to confirm metrics reported by other departments
* Provide a periodic meeting (for executives/mangers) that reviews all metrics and allows them to alter them if need be
* Don't try to provide a small number of measurable metrics or a "north star" single metric
The idea being that the review meeting of 500+ gives a better potential model. Further, even though 500+ metrics is a lot to review, each should be reviewed briefly, with most of them being "no change, move on" but allows managers to get a holistic feel for the model and identify metrics that are or are becoming outliers (positively or negatively correlated).
The independent finance department means that the reporting of bad data is discouraged and the independent finance department coupled with the WBR and its subsequent empowerment, allow for facilities to change the system.
The three main points (make difficult to distort the system, distort the data and provide facilities for change) need to be all implemented to have an effect. If only the "punishment" is provided (making it difficult to distort the system/data) without any facility for change is putting too much pressure without any relief.
If they knew how to do it and that they should. I think Goodhart’s Law is useful to know about because what it’s really suggesting is that people are shockingly good, probably much better than you thought, at distorting the system.
https://commoncog.com/becoming-data-driven-first-principles/
https://commoncog.com/the-amazon-weekly-business-review/
(It took that long because of a) an NDA, and b) it takes time to put the ideas to practice and understand them, and then teach them to other business operators!)
The ideas presented in this particular essay are really attributed to W. Edwards Deming, Donald Wheeler, and Brian Joiner (who created Minitab; ‘Joiner’s Rule’, the variant of Goodhart’s Law that is cited in the link above is attributed to him)
Most of these ideas were developed in manufacturing, in the post WW2 period. The Amazon-style WBR merely adapts them for the tech industry.
I hope you will enjoy these essays — and better yet, put them to practice. Multiple executives have told me the series of posts have completely changed the way they see and run their businesses.
Business leaders like to project success and promise growth that there is no evidence they will or can achieve, and then put it on workers to deliver that, and when there's no way to achieve the outcome other than to cheat the numbers, the workers will (and will have to).
At some point businesses stopped treating outperforming the previous year's quarter as over-delivering, and made it an expectation, regardless of what is actually doable.
It then discusses ways that the factory might cheat to get higher numbers.
But it doesn't even mention what I suspect the most likely outcome is: they achieve the target by sacrificing something else that isn't measured, such as quality of the product (perhaps by shipping defective widgets that should have been discarded, or working faster which results in more defects, or cutting out parts of the process, etc.), or safety of the workers, or making the workers work longer hours, etc.
As "Goodhart's law" is used here, in contrast, the focus is on side effects of a policy. The goal in this situation is not to make the target useless, as it is if you're doing central bank policy correctly.
They need something they can check easily so the team can get back to work. It's hard to find metrics that are both meaningful to the business and track with the work being asked of the team.
Do you have enough KPIs that you can be sure that these targets also serve as useful metrics for the org as a whole? Do you randomize the assignment every quarter?
As I talk through this ... have you considered keeping some "hidden KPIs"?
Here's the thing, there's no fixing Goodhart's Law. You just can't measure anything directly, even measuring with a ruler is a proxy for a meter without infinite precision. This gets much harder as the environment changes under you and metrics' utility changes with time.
That said, much of the advice is good: making it hard to hack and giving people flexibility. It's a bit obvious that flexibility is needed if you're interpreting Goodhart's as "every measure is a proxy", "no measure is perfectly aligned", or "every measure can be hacked"
I might be wrong but I feel like WBR treats variation (looking at the measure and saying "it has changed") as a trigger point for investigation rather than conclusion.
In that case, lets say you do something silly and measure lines of code committed. Lets also say you told everyone and it will factor into a perforance review and the company is know for stack ranking.
You introduce the LOC measure. All employees watch it like a hawk. While working they add useless blocks of code an so on.
LOC commited goes up and looks significant on XMR.
Option 1: grab champagne, pay exec bonus, congratulate yourself.
Option 2: investigate
Option 2 is better of course. But it is such a mindset shift. Option 2 lets you see if goodhart happened or not. It lets you actually learn.
(a) All processes have some natural variation, and for as long as outputs fall in the range of natural process variation, we are looking at the same process.
(b) Some processes apparently exhibit outputs outside of their natural variation. when this has happened something specific has occurred, and it is worth trying to find out what.
In the second case, there are many possible reasons for exceptional outputs:
- Measurement error,
- Failure of the process,
- Two interleaved processes masquerade as one,
- A process improvement has permanently shifted the level of the output,
- etc.
SPC tells us that we should not waste effort on investigating natural variation, and should not make blind assumptions about exceptional variation.
It says outliers are the most valuable signals we have, because they tell us we are not only looking at what we thought we were, but something ... else also.
If companies knew how to make it difficult to distort the system/data, don't you think they would have done it already? This feels like telling a person learning a new language that they should try to sound more fluent.
* Create a finance department that's independent in both their reporting and ability to confirm metrics reported by other departments
* Provide a periodic meeting (for executives/mangers) that reviews all metrics and allows them to alter them if need be
* Don't try to provide a small number of measurable metrics or a "north star" single metric
The idea being that the review meeting of 500+ gives a better potential model. Further, even though 500+ metrics is a lot to review, each should be reviewed briefly, with most of them being "no change, move on" but allows managers to get a holistic feel for the model and identify metrics that are or are becoming outliers (positively or negatively correlated).
The independent finance department means that the reporting of bad data is discouraged and the independent finance department coupled with the WBR and its subsequent empowerment, allow for facilities to change the system.
The three main points (make difficult to distort the system, distort the data and provide facilities for change) need to be all implemented to have an effect. If only the "punishment" is provided (making it difficult to distort the system/data) without any facility for change is putting too much pressure without any relief.