Goodhart’s Law in Software Engineering and How to Avoid Gaming Your Metrics
The topic of metrics can be a heated discussion within the engineering management community. What should teams measure, and how much do you focus on them? What should leadership avoid and how should we interpret the data? Some teams use them and achieve excellent results such as increased productivity, collaboration, and speed. Others try to use metrics but suffer unforeseen negative consequences in achieving a specific metric goal.
The difference between those that successfully integrate metrics into their engineering operations and those that stumble largely centers around the ability to mitigate the effects of Goodhart’s Law. In this post, we’ll explore what Goodhart’s Law is, its relevance in software engineering, and how to mitigate its effects with proper planning and implementation.
What is Goodhart’s Law?
Goodhart’s Law is summarized by the adage, “When a measure becomes a target, it ceases to be a good measure” made famous by the British economist Charles Goodhart. The law has applications across all domains that utilize metrics to gauge success. My favorite example of Goodhart’s Law is the story of the “cobra effect” in India. The story shows how quickly focusing on one metric can result in many unintentional (and in this example, frightening) results.
Why does this happen? In short, when you design a system where achieving one metric is the primary goal, the system will do anything to ensure that the condition is met. For example, in most workplaces where incentives are in place to achieve a metric target, or where bonuses or employment are contingent on that target, it’s easy to see that “the system” (the employees and teams) take creative liberties to ensure its achievement. Even in situations where it’s perceived (even without ever being definitively stated) that “performance” will be evaluated based on a given metric, the system will continue to focus on ensuring this metric goal is achieved.
Goodhart’s Law in Software Engineering
More teams are adopting engineering metrics, and if they’re not intentional and careful about their metrics strategy, the effects of Goodhart’s Law are sure to follow. Even when teams aren’t measuring just one metric, there are times when teams over-rotate on certain individual metrics, to the point where course corrections need to be made. They might ask, are we measuring the right things, or did we miss something critical when rolling these out?
What a team measures speaks to what they value, but what you don’t measure often speaks louder about where a team is vulnerable. Metrics don’t exist in isolation. Specific technical limitations aside, if your team can measure speed, it should also measure quality. If a company can measure output, it should also measure collaboration. Each of these examples points to a dynamic where engineering metrics influence one another. Whether the metrics are positively or negatively correlated depends on the metric categories involved. The diagram below seeks to articulate this point, with the various possible desired outcomes of the engineering team influencing and competing with one another. At Jellyfish we believe that teams should focus on affecting outcomes, and develop strategies for metrics around those, rather than achieving specific metric goals without context.
Milan Thakker, Product Manager at Jellyfish, recently described at GLOW 2022 summit how metric categories influenced one another. In this discussion, he details why companies like Hootsuite have used DORA and other metrics to 1.) help create a balanced metric strategy, and 2.) drive their desired outcomes. You can check out the full discussion here.
In this example, Hootsuite leveraged DORA metrics to create a metric strategy where many key parts of the development process are considered. The diagram below shows that those same engineering outcomes desired by the team have a metric that can be monitored over time to ensure their teams are not over-rotating in one area. They don’t focus on one category over the other, instead using the full set with anecdotal feedback from 1:1s to paint a picture of what’s going on at a macro-level amongst the teams.
Can you avoid the effects of Goodhart’s Law?
The short answer is yes, but only if you’re intentional about your metrics strategy.
“The system” is going to try to game itself, but by proactively thinking about how metrics can balance each other out, you stand a better chance of avoiding the impact of Goodhart’s Law. It’s important to know the nature of the metrics that you’re measuring and how they can be potentially manipulated. With this knowledge, your teams can monitor for that manipulation. You can’t measure everything, and even if you did, this will inevitably lead to a lack of focus. If everything is a priority, nothing is a priority. Somewhere in the middle of a “laissez-faire” approach and “silver bullet” metric lies a balanced metric strategy that is right for your organization.
The SPACE Framework is a fantastic resource that outlines a way to begin measuring what matters most to your engineering organizations without getting too bogged down with analysis paralysis or Goodhart’s Law. If you haven’t considered it yet, we’d highly recommend it. And if you have any questions about how Jellyfish can help you measure the metrics that matter most to you, check out our product tour or request a demo today.