Editor’s Note: This article first appeared on DevOps.com.
“GenAI: too much spend, too little benefit?”
That’s the question Goldman Sachs asked in an influential report this summer.
It’s safe to say we’ve entered a trough of AI disillusionment. Some companies are laying off staff, while others are being targeted for acquihire (for example Character AI by Google, Inflection by Microsoft, and Adept by Amazon). These factors combined with swings in tech stocks indicate that the market has yet to make up its mind when it comes to valuing AI.
Despite the uncertainty, genAI has gained traction in several sectors, and engineers in particular have embraced AI coding tools like GitHub Copilot. Stack Overflow’s 2024 Developer Survey found that “76% of all respondents are using or are planning to use AI tools in their development process this year.”
With AI coding tools, software companies took the approach of testing first and asking questions later. Those companies are now taking a microscope to their investments and need to justify whether their spending on coding assistants is paying off.
Continued Uncertainty Around Performance and Pricing
Earlier this summer, Jellyfish introduced the Copilot Dashboard to measure the impact of the most widely adopted genAI coding tool. We’ve since gathered data from over 4,000 developers at more than 100 companies, giving us a representative sample of how engineering organizations are using Copilot and what impact it’s having on production. No two teams are the same, but the aggregate data can help engineering and business leaders understand whether they’re getting adequate return on their AI investments.
Without measurement, engineering teams can only gauge the value of their investment in terms of engagement. Are the engineers using Copilot? Sure. But are they faster or more productive? Is the code better, or is that speed coming at the expense of quality?
Pricing for these tools is another potential variable. In October 2023, The Wall Street Journal reported that Microsoft was losing $20 per month per user on its GitHub Copilot product. GenAI coding assistants like Copilot rely on LLMs to drive their service, and the cost of querying those LLMs isn’t cheap! Providers of AI coding tools will inevitably make corrections to their pricing, and software companies will have to decide all over again whether they’re getting value for their investment.
After analyzing aggregate data from Jellyfish’s GitHub Copilot Dashboard, here’s what we found:
- Coding is faster, leading to time savings: Across the board, developers are writing code 23% faster. This impact varies by role. For example, mobile developers and backend developers experience more substantial gains, with coding speeds improving up to 42% and 38%, respectively.
- Coding throughput is increasing: Engineers using Copilot saw a 20% increase in pull requests (PRs) created across all engineering work, meaning, in layman’s terms, that these engineers are pushing more code (and therefore more product) out into the world.
- Junior developers see the largest increase in speed: Across all engineers, Copilot is giving junior developers a 4X boost in coding speed. This implies that Copilot is more useful for automating entry-level tasks and less useful for the more complicated work tackled by senior developers.
Where We Go From Here
We’re in the trough of disillusionment of the genAI hype cycle but there is a bright spot. The technology will continue to develop, and more targeted solutions like AI agents have the potential to disrupt markets more than general AI tools.
Copilot is a transformative tool for software engineering, but organizations must be realistic about its limitations. Coding is only a fraction of what engineers spend their time on in the entire software development life cycle. The first measurements we’re seeing of Copilot adoption show meaningful boosts across different roles and workflows in software organizations. Taken together, these boosts add up to time and cost savings that more than justify the investment now while setting organizations up to realize larger downstream gains in the future.
Regardless of what happens in the months to come, companies can’t afford to experiment without measurement. The “trial and error” phase for genAI is over; we need to verify whether the investments are paying off. Experiment, measure, adjust, repeat.