But doesn’t the “ What About the Details?” section acknowledge that you will many of the other metrics later anyway for different people, projects and problems? Isn’t the message then “You should know which metrics are relevant to you and only look at them?”
You can thus have thousands but hardly ever will you have thousands all red.
And when you create a new metric, TDD style, the company will not be able / designed / changed to make it green.
There are some metrics that are binary good/bad and I agree that in those cases you should just have an alert.
There are a lot of pathological metrics patterns. I used to work at a FAANG/MANGA that prided itself on being metrics driven; the challenge was that most people suck at picking metrics. The most common anti-pattern is choosing metrics based on what is easy to measure.
The most valuable metrics I have encountered are metrics that measure directly your teams success in their stated mission. The problem is a lot of teams have bad missions. I tried to tell every team that I worked with, that their mission should read like a problem statement not like a technology statement. For instance, instead of saying that “we are the team that owns the foo service“, the team needs to think about the business problem that inspired the foo service, and make solving that business problem their mission statement.
Once you have clarity on the business, problem that your team exists to solve, then you can start thinking about metrics that measure how well you are solving that problem. These are the most valuable kinds of metrics.
Now, the thesis of the article was that teams had too many metrics, and that this was bad. Once a team has clarity of mission, they have to implement technology to accomplish the business problem that is their mission. After you have clear metrics that tell you how well you were accomplishing your mission, then you need metrics to tell you how well your technology is functioning. Do not mix the two kinds of metrics. It is a happy coincidence if the technology functioning well directly, corresponds to how well you are accomplishing your business mission. More likely, the business metrics and the technology metrics need to be kept separate. You need metrics around the technology so that you can predict whether you are nearing a problem, detect whether you have a technology problem, And identify the nature of the technology problem that you have. Good metrics around technology will allow you to do all these things. You should not add metrics arbitrarily, but you should analyze your technology for where it is likely to break and prioritize adding metrics in that fashion.
My last point is about team just functions. Other anecdotes in this comments thread have described organizations that lacked clarity of mission or that had completely solve. Their mission yet did not pivot to a new mission. In those cases, you end up with a lot of make work, and that make work may consist in part of implementing new metrics. This is just plain org dysfunction and not really a metrics problem.
Too many metrics can be a problem but it's not the real problem. The real problem is choosing metrics without any regard for the decisions they're supposed to inform.
When you understand that the purpose of all measurement in business is to reduce uncertainty for a decision to be made, everything comes into focus, and you'll have a natural constraint on your scope of measurement.
I think it’s important to divide “internal” / operational metrics, which a team monitors but doesn’t expect anyone else to care about (say, error rate of the DB or some internal ops task latency) vs. “external” metrics which are rolling into OKRs / KPIs or otherwise being reported out as health metrics of the team.
I struggle to believe that most companies with 50-100 metrics are actually using most of them as external metrics.
"The key is that any given person shouldn’t be using any more metrics than absolutely necessary to do their job well."
To be able to understand your funnel in a marketplace you’ll need to at least track profile and product views & click-throughs in addition to the metrics mentioned. How can you possibly tell what’s wrong when all you have is a average GMV metric going down?
Tracking metrics because you think they might be useful in the future for diagnosing other problems doesn't make sense with modern systems as dynamic queries are so fast.