Thanks for posting this. Would like to learn more detail on your work. I've been drawn to the topic by originally hearing about Goodhart's from developers, even though it interestingly originated with observations on UK's monetary policy.
I think the same thing applies to some biomarkers; cholesterol, for one.
Cholesterol is an accepted biomarker for stroke & heart disease. Thus, Big Pharma can "prove" that lowering your cholesterol by M is worth N.n years on your life. Therefore, take our statins. Daily. Forever.
I'm not buying it. That's optimizing the measure. "Exercise more & lose weight" is thought impossible for the average person, or at least, unlikely to be heeded, so give them a statin instead.
That was my initial understanding, which left me confused.
But they're taking the top n according to the model, then taking the top according to the proxy, not actual, objective. This avoids the Winner's Curse problem of top model ranking with reasonable probability.
They are then comparing this to the highest scoring actual preference.
(I wrote a (non-math, non-AI) version to learn more about Goodhart's Law here: https://unintendedconsequenc.es/new-morality-of-attainment-g...)
Cholesterol is an accepted biomarker for stroke & heart disease. Thus, Big Pharma can "prove" that lowering your cholesterol by M is worth N.n years on your life. Therefore, take our statins. Daily. Forever.
I'm not buying it. That's optimizing the measure. "Exercise more & lose weight" is thought impossible for the average person, or at least, unlikely to be heeded, so give them a statin instead.
Deleted Comment
But they're taking the top n according to the model, then taking the top according to the proxy, not actual, objective. This avoids the Winner's Curse problem of top model ranking with reasonable probability.
They are then comparing this to the highest scoring actual preference.