We’re Measuring the Wrong Thing
Everyone’s obsessed with speed. “55% faster task completion!” “50% faster time-to-merge!” Every pitch deck about AI coding tools leads with how much faster your team will ship. But after digging into the actual research behind GitHub Copilot’s impact on developer productivity and happiness, I think the industry is burying the lede.
The most transformative finding isn’t about velocity at all. It’s that 60-75% of developers feel more fulfilled and less frustrated when coding with AI assistance. 88% feel more productive — not measured, perceived. 73% say Copilot helps them stay in the flow. 87% say it preserves their mental effort during repetitive tasks.
Those aren’t efficiency metrics. Those are human metrics. And I’d argue they matter more than any time-to-merge number ever will.
What the Research Actually Shows
GitHub’s research team, led by Dr. Eirini Kalliamvakou, used a mixed-methods approach combining a large-scale survey of over 2,000 developers, a controlled experiment with 95 professionals, and qualitative interviews. The study was designed around the SPACE productivity framework — a model developed by researchers at GitHub, Microsoft, and the University of Victoria that measures developer productivity across five dimensions: Satisfaction & Well-being, Performance, Activity, Communication & Collaboration, and Efficiency & Flow.
Here’s what they found:
| Metric | Finding |
|---|---|
| Job fulfillment | 60-75% feel more fulfilled |
| Flow state | 73% stayed in the flow |
| Mental energy | 87% preserved mental effort |
| Task speed | 55% faster completion |
| Task success | 78% vs. 70% completed the task |
The speed number gets all the headlines, but look at the rest of that table. Three out of five metrics are about how developers feel, not how fast they move.
The Enterprise Confirms It — At Scale
In 2024, GitHub partnered with Accenture on a randomized controlled trial measuring Copilot’s real-world impact across a large enterprise engineering organization. The results amplified the original findings.
90% of developers felt more fulfilled with their job. 95% enjoyed coding more. 70% reported significantly less mental effort on repetitive tasks. And 54% spent less time searching for information and examples.
On the output side, the Accenture study showed an 8.69% increase in pull requests per developer, a 15% improvement in PR merge rate, and an 84% increase in successful builds. The adoption numbers were equally telling: 81.4% of developers installed Copilot the same day they received a license, and 67% used it at least five days per week.
These aren’t developers checking a box for management. These are engineers voting with their keyboards.
Why Fulfillment Matters More Than Speed
Here’s where I’ll put on my engineering leadership hat. With 20 million GitHub Copilot users and 90% of Fortune 100 companies onboard, the ROI conversation needs to evolve beyond “we ship faster.” Replacing one engineer costs 6-9 months of salary. In a market where 76% of developers are already using or planning to use AI tools, the teams that don’t invest in developer experience will lose talent to the ones that do.
The DevEx framework, published by Noda, Storey, Forsgren, and Greiler in ACM Queue, identifies three core dimensions of developer experience: feedback loops, cognitive load, and flow state. Copilot’s measured impact maps directly onto cognitive load reduction and flow state preservation — two of the three pillars. GitHub’s own DevEx research confirmed that blocking time for deep work yields 50% more productivity and that fast code reviews drive 20% more innovation.
The 2024 DORA report reinforces this: elite-performing teams don’t just deploy more frequently — they invest in developer experience, observability, and continuous improvement. Copilot’s Accenture data showing more PRs and dramatically more successful builds is directionally aligned with DORA’s throughput and stability metrics, even though the study didn’t measure DORA directly.
This is the real business case: happy developers don’t just code faster. They stay. They mentor. They actually want to show up tomorrow.
The Skeptic’s Corner
I’d be doing you a disservice if I didn’t flag the limitations. I believe in this data, but I also believe in intellectual honesty about DevEx research.
Self-selection bias. GitHub’s primary study surveyed developers who opted into the Copilot Technical Preview. Early adopters are inherently more enthusiastic. The 60-75% fulfillment number likely skews higher than what you’d see across all developers.
GitHub researching its own product. Both studies were conducted by GitHub researchers studying GitHub’s product. The methodology is sound — peer-reviewed in Communications of the ACM — but the conflict of interest is real. Independent replication matters.
Novelty effect. A longitudinal study at NAV IT (Norwegian Labour and Welfare Administration) found that while developers perceived strong productivity gains from Copilot, objective commit-based activity metrics showed no statistically significant changes. Subjective experience and objective output don’t always align.
The 55% speed claim. That controlled experiment had developers write an HTTP server in JavaScript — a well-bounded task ideal for AI completion. Speed gains on ambiguous, architecturally complex problems are likely lower.
Cognitive load displacement. Copilot may shift cognitive load rather than reduce it. Instead of writing boilerplate, developers now review AI-generated suggestions for correctness, security, and architectural fit. That’s a different kind of cognitive burden — and it wasn’t measured.
None of this invalidates the research. But it contextualizes it. The fulfillment signal is strong. How strong? We’ll need more independent studies and real-world adoption data to say with certainty.
Stop Measuring Speed. Start Measuring Well-Being.
If you’re evaluating Copilot — or any AI coding tool — and your only KPIs are “lines of code” and “time to ship,” you’re measuring the wrong metrics. The SPACE framework exists for a reason: Activity alone is explicitly treated as insufficient. You need to measure Satisfaction, Efficiency, and Flow alongside Performance and Activity.
Here’s what I’d actually track:
- Flow state disruption rate — how often do developers context-switch?
- Cognitive load surveys — are repetitive tasks consuming mental energy?
- Voluntary attrition — are your best engineers staying?
- Developer satisfaction pulse — monthly, anonymous, 3 questions max
- PR cycle time + build success rate — output metrics that actually correlate with quality
GitHub publishes a Copilot Metrics API and an Engineering System Success Playbook for organizations that want to get serious about measurement. Use them.
The Bottom Line
The most interesting thing about AI-assisted development isn’t that it makes us faster. It’s that it might make us happier. And in an industry with chronic burnout, high turnover, and engineers who dread touching legacy code — that’s not a soft metric. That’s a strategic advantage.
The research isn’t perfect. The sample sizes have limitations, the funding sources have conflicts, and the long-term effects are still unfolding. But the directional signal is clear: when you reduce cognitive load, preserve flow state, and eliminate the soul-crushing boilerplate that nobody signed up for — developers feel more fulfilled. And fulfilled developers build better software.
Stop optimizing for velocity. Start optimizing for the people who create it.