The problems with measuring developer productivity

Instead of trying to measure individual performance, we prefer to focus on making a developer effective by creating an excellent engineering environment for them. This is the best tooling, processes, engineering culture, incentives, and teams. To give them the best chance to create value for your customer and company.

However, a developer is the most expensive line item in product development for any company, far more than other roles and operational costs. Many of the tasks they do are complex and laborious and require focus, experience, and skill. A common question arises: How do we ensure we get the most productivity from our developers, and how do we measure that? In this article will explore why It is tough to measure developer productivity. how most efforts tend to measure activity and output, and will not promote the activities that generate more value.

Why are companies focusing on developer productivity?

Due to recent macroeconomic stations. Companies and investors are tightening their belts; it is harder to hire your way out of productivity problems. The industry has turned to focus on efficiency and productivity. Companies like Google and Meta are publicly talking about it as a business strategy. There is an explosion of innovative tools that offer increased productivity, such as co-pilots, engineering management platforms, and developer portals. Companies are launching developer productivity initiatives and tasking platform engineering teams and engineering managers to find improvements.

What are the benefits we are expecting?

The most prevalent reasons for starting an initiative include:

Cost efficiency of their engineering organization
Faster time-to-market and increased product portfolio
Improved product quality
Engineering satisfaction and reduced attrition

WHAT HERE?

What does a developer produce?

The beginnings of the problems, stems from the wrong picture of what a developer produces. Let’s explore this.

A developer develops code, they spend a lot of their day writing code. But a developer is not trying to produce code, they are trying to produce software, useful working software, code is just one of the ingredients. Modern software developement expects the developer to think beyond writing perfect code and being involved in all aspects of creating a great software product. A lot of companies choose not to use the word developer, as it has a conitation of purely developing code, they prefer to use engineers and engineering. An engineer, like a structual engineer, or a mechnical engineer, thinks about how their product will function in the field, or in our case in production when used by a customer.

How does a developer produce great software?

Effcient Modern applications are often glue between third-party services. In general accumlating code is a not a good thing; it is expensive, code has to be maintained, supported and tested. Instead we can use a wide range of building blocks – cloud services, SaaS products, open-source libraries. The services may come from internal teams, leveraging already built functionality, data products, or a technical capablities. This reduces waste, and produces productivity, as we are not building the same thing multiple times.

Developers spends time selecting and integrating services.

High Quality The custom code we do write lasts a long time, so it should be high quality – simple, understandble, supportable, and extendable. This is important, as it affects the productivity of whoever will work on it in the future. The code is realization of that design. Lower quality code will often be faster to write and more verbose.

The software itself has to be high quality too, it should work, perform what it was intended to do. A developer will take time to write tests and make applications testable. The productivity benefits is particulary felt when we have to make changes to the product in the future. A good testing safety net gives confidence to developer, that won’t break any existing functionality. However, low quality tests, can cause more harm than good, if they are undeterminisitc or very slow.

A developer spends time designing for quality.

Accesible The software we create has to be deployed into an environment a customer can see, developers spend time preparing code for deployment. They are working with configuration for the application and infrastructure they need.

Reliable It has to be reliable (performant, available, secure), so that it can handle the demands in terms of customers in the production environment. and designing for reliability.

Achevies Outcomes The most important, and often ignored. A developer is part of a cross-functional team, trying to build a successful product. As 1000 failed startups know, just because we built software, doesn’t mean anyone will use or pay for it. Every technologust has been part of a 2 year “program” that resulted in next to no value. Too often feature requirements are treated like guaranteed value. Effective modern developer acknowldeges that requirements are guesses, we might confident in those guesses, but they still guesses. This applies to internal applications and technical capablities (the “next gen” platform that no one used). The goal is validate they are value as quick and cheaply as possible.

What does this have to do with developer producitivy. If we are not regularly checking that what we are building is valuable to customers, are we being productive? It’s like building a house, that stays vacant. A developer is critical to creating a succesful product in many ways. They collaborate closely with product and design to help to design experiments and implement ways to validate ideas. Time to market can be influenced massively by creating simple solutions to avoid over-engineering.

It is widely understand that measuring output is not that useful, however I would say that looking at commits and code, is not even measuring the output. The output of a productive developer is working software, with the attributes we mentioned.

They are critical of this,

A developer doesn’t do these things alone, of course, they are supported by tech leads, architecture and SREs on their team, and ideally their engineering platform will provide paved roads and checks. However it’s important to acknowledge that very productive developer will be thinking about all of this. There are also other things a developer does that produce value for your business – helping with support improves customer satisfaction; knowledge transfer avoids productivity interruptions when colleagues resign; mentoring and upskilling increases productivity of junior developers; thinking about future scalability avoids potential future rework.

The problem we have is that a lot of these are not easily measured, and certainly hard to attribute to single person or change. The doesn’t mean we are in the dark, there are things we can look at it, but it’s not won’t be a nice simple graph. Let’s look at the typical quantitive metrics available

As you can see from the short exploration, a developer produces many different things as part of building a product. The metrics we often see measured as aimed at the simplest tasks (activity and output), the high-value skills you get from really productive developers such as empathy, quality, pragmatism, collaboration, and learning from mistakes are complicated to measure quantitatively.

Software development is not like a factory line

A lot of the problems of measuring developer productivity stem from an inaccurate picture of what happens to produce useful software. It can appear that software is simple – a product manager writes the requirements, designers do the design, and developer codes to these designs, the customer uses the software, then profits! How you might imagine a factory line to work. The problem with this analogy that, unlike a factory, we are not creating the same things over and over again. Each request is designed to bespoke requirements..

One of the most capabilities of a effective sofater development time, is speed to learning.

I liked the discussion Mike Fisher and Abi Noda, were having around comparing the analogies of mining vs art. If software development were like mining (or bit coing mining), you would do some mining (work) and immediately, you would have value (not quite there is shopping, packaging, sorting etc). Whereas art when you create something, does have any value at all, only if the customer is willing to pay for it.

Software development is knowledge work; the product ideas, the design, and the quality of the software make the real difference and are what you are hiring your knowledge workers to do (the developer, product, and design folks). For a developer, the act of coding is realizing that design.

That being said, within software development, there a lot of manual work. For example STUFF HERE. Removing as much manual work is key to enabling the knowledge work.

Which is software development akin too? I think it’s like graphics design or an advertising agency; there is a lot of manual work that can be automated (Photoshop, editing booth), but the value you pay for is the quality of work and the design.

Developers don’t produce value by themselves

A common tactic is employing quantitive metrics to assess individual developers’ productivity. On the surface, this seems logical. However, it is challenging because software is a collaborative effort involving developers, architects, designers, and product managers, and other important functions we often forget – project management, operations, support, sales team, account management, marketing, and user research, all of which contribute to the success of a product.

Accurately attributing contributions is difficult. While we can monitor a developer’s activity in the source control system, this data doesn’t give enough information. It’s normal and desirable for developers to design together, particularly when creating something new. Multiple perspectives and diverse expertise improve a design. Additionally, the level of activity is also greatly influenced by the state of the existing code base, which they may not have written. If a developer has to refactor extensively, it will take more time and yield fewer lines of code.

Other developer values, like quality and reliability, evolve throughout the product lifecycle, and developers work with their technical leads, site reliability engineers (SRE), and platform teams to enhance these attributes. If a developer did cause a reliability issue. This is likely a deficiency with the system’s safety net (test, observability, resilience, knowledge) and should be treated like a learning opportunity, not a productivity indicator

There is too much specialized context to compare team productivity

Comparing Teams

The best practice approach to organizational design is to split your product into anonymous sub-product teams focused on their own value stream. Each team is characterized by distinct product goals, technology stack, usage profile, and performance and scalability criteria. The team’s manager would have formed the team based on their budget constraints and the specific skillsets they require.

Search is concerned about relevance and click-through, they run experiments to find different ways to improve the search results, keeping within its performance goals. The payments team ensures no lost transactions, providing redundancies in case the payment provider goes down. Checkout would create a simple user interface to capture information so users don’t get frustrated and leave.,

A team working on a monolith will have different metrics to finely-grained microservices, and some product teams might be wrestling with technical debt. Some teams can benefit from A/B test to quickly get customer feedback, others are more revolutionary and require a different validation approach, which would show different patterns.

These contextual differences mean there isn’t a simple way to accurately compare teams. When engineering or delivery metrics are aggregated into charts, compared across teams, and shown to higher management, it is doesn’t represent productivity. It can result in actions that will severely damage the team..

Comparing Developers

Story 1: Add field to a form Story 2: X Story 3: Y

The contextual problems also apply to comparing developers. You can’t determine what constitutes good or bad activity metrics, e.g., merged commits or lines of code . Every requirement story varies in complexity, with some requiring extensive design work, while others, especially if a pattern already exists, are simple – like adding a field to a form. Another scenario is when a developer is creating a spike as opposed to a polished production-ready feature.

A high amount of activity could indicate a problem; it could be a sign of a “hero developer”, attempting to solve all the challenges themselves. They might not be doing refactoring, reducing the quality. In contrast, a slow activity number might be because they spend time mentoring or pairing. None of this is “bad”. We also must remember developers are human (what?) and not homogenous in how they learn and think. They don’t approach their work in precisely the same manner.

Measuring output will incentivize the wrong thing

Measuring activity (commit, PR), will just tell us, that a developer is sitting at their computer, writing code and checking in, which is of little value. It is an indicator that the ingredient is being created, like the analogy of art, you need a canvas, a brush, and paints (unless it’s a conceptual art showing a blank space) to produce art. Frequency and number of deploys are also just measuring output; it’s a better indicator, we at least have software deployed but that doesn’t mean the software is being used.

As anyone familiar <>, you get what you measure.. If we measure commits and use that to assess developer performance, we will get commits. I have been in situations where the company has measured keystrokes and IDE open time, Guess what they got? lots of developers that didn’t collaborate and very verbose code. In a way, output measuring actually makes it simple for a developer, they know how to look good. They don’t have to actually think about quality or the customer experience; just keep checking in. It will prevent collaboration and mentoring (if you are pairing, who gets the “credit”?).

Building a product that is both valuable to customers and high-quality is hard and messy; we want to expose those challenges to the whole team so we get discussion, descent, and debate. They pivot and improve with each new learning. If you just have your developers fulfill predetermined requirements - a coding robot churning out more widgets, you will get more software, but is very likely no one will use it.
What do we recommend?

Short iterative milestones with measurable business outcomes

We were recently working with fast-moving digital natives, building out new product features for them in a very competitive marketplace. To begin with, we were using a set of agile metrics; we used burn downs and estimated everything very carefully. Our stakeholders reviewed them at every showcase, and we talked about them in retrospect. What we soon realized is that the value of those reports was pretty small, the stakeholders they didn’t care that much. What they wanted to know is what value you have created for our customers. The company was well versed in lean methodologies, we changed the things we report on: Features that have been released to customers What is blocking the feature from being released Leading indicators of the success of those features e.g., behavior changes, customer complaints, social media response, A/B test results, user research results
Lagging indicators, e.g., increase in project, decrease in cost, market share increase

Essentially, what value is the team providing to their business and customers?

The other approach we took was abandoning a traditional project plan, and instead used a collection of hypotheses. As most lean practitioners will know, requirements are just guesses until a customer actually uses it. Too often, requirements are treated like guaranteed business value, whereas often assumptions that were to create the requirements turned out to be wrong, or we had to iterate until it actually resonated; the slightest change can often make a feature successful.

If we work on a 12-month project with little validation with the customer, then our only option is to measure output and throughput. Most technologist has experienced long projects that didn’t have the impact you expected.

Our recommendation here is, before reverting to output and throughput-based metrics, work on how we measure the team on business value. It might be too difficult to use lagging indicators like profit, but product managers can come up with leading indicators that are more actionable

The second recommendation is to have small milestones and batch size, with close feedback loops with the customer. If a team can envision a milestone, it’s understandable and obtainable. Then, you can challenge the team to think about how to achieve it simply and to optimize towards those metrics. This includes the developers, who are critical to developing simple solutions and experiments. We can use the DORA metrics to guide us to that smaller batch size.

An antipattern we often see moving to more frequent deployment without embracing lean and continuous delivery principles will actually mean it has a big impact on product success.

Performance management can’t be automated, engineering managers are hired to handle this

THIS ONE I AM LESS COMFORTABLE WITH

We often see performance management under the guise of developer productivity. Leaders want a quick way to optimize their engineering population (by which I mean firing and hiring). This is really performance management and that is what we hiring engineering managers to do. They know the developer in their team, their personality, their habits, and what motivates them. They will use some of the metrics to inform if a developer is only checking in once a week is a problem or if a developer taking a long time to do code reviews. In most situations, that has nothing to do with productivity and more to do with expectations. The engineering manager, with their tech lead, will be able to assess the contribution they make to the overall productivity of the team, based on many nuanced factors.

A productive developer contributes to
Signals

Efficient Architecture Utilizes third parties, and libraries, no unnecessary duplication

High-quality code is simple and adheres to standards

Reliable Software is performant, secure and available

Released it is accessible to customers and able to generate business value

Achevies outcomes

Shared Knowledge Documentation, collaboration,

Future Scalability

Upskilling

Create a great developer experience and allow teams to continuously improve

It is often overlooked that developers don’t set out to be unproductive. Their effectiveness is shaped by the environment they find themselves in. Often if they are unproductive, it’s because that is what their company has taught them to be; we have created a subpar developer experience without any method or reason to bolster their productivity.

More often than not, the root cause of diminished productivity doesn’t stem from slow coding skills. It’s a series of small inefficiencies that compound to render them unproductive – lack of quality documentation, unstable test suite, slow development machines, long compile times etc.. Sometimes we aren’t even letting them code by having too many meetings. The act of coding is just a part of the productivity equation, which with the advent of more sophisticated editors, coding co-pilot, and better information, it’s becoming less of a time sink.

Frequently, the productivity bottlenecks could be fixed if they had the autonomy to. When I work on agile product development teams, friction is addressed through regular retrospective and continuous improvement. The manager and tech lead would listen to the concerns of the developers. Either solving it or tasking passionate developers to solve it. During hack weeks, a lot of the ideas would be focused on making developer lives easier across the organization. Backstage at Spotify came from a team of developers collaborating during hack week.

Within a product team, engineering and throughput metrics are the most useful. Technologists that are close to work can interpret the meanings. To guide continuous improvement, they would monitor metrics regularly to spot trends and bottlenecks. We can enable this my providing development teams easy access to analytics as part of your engineering platform.

Use signals & analysis to improve the effectiveness of our engineering

At the beginning of this article, we stated that we would prefer to focus on the effectiveness of our developers and the engineering organization rather than measuring productivity. The metrics that we see being used to measure developer productivity are better used to optimize tools, processes and communications. We recommend having an enabling team that is focused on the effectiveness of your engineering organization; often, this is part of a platform organization. A developer experience team welcome new trend. They would analyze the engineering value stream to spot waste and optimize the developers’ experience.

Some of the key guiding metrics an enabling team would monitor

Time for productivity for new hire Time for resolution of the issue Engineering Satisfaction Num of experiments, experiment success rate Level of tech debt (as identified by the teams) 4 key DORA metrics Reliability metrics Local developer metrics (compile time, developer environment setup time) Testing metric (time to reproduce the issue, test suite run, stability of build) Launch new service - time, steps, Financial Metrics (cloud costs, cost per customer, cost per request)

Some of these we can automate with instrumentation, but some are harder and require a deep analysis. Some processes will go across departments. Some will require tracking down detailed information to see what happened. value stream analysis with a cross-functional team if an excellent tool to identify the blockers. Other data is not easily quantifiable; for example, tech debt cannot be easily measured, so instead, we use the aggregation of the opinions of the teams to identify the worst productivity offenders. Productivity for new hires, doesn’t mean they have completed orientation and been given a laptop, it’s whether they are actually contributing meaningfully.

As we do analysis and research, we need rich detailed data, opinions and anecdotes to support some of our objective measures. We do this by talking to developers, doing 1:1s, and do surveys, It’s important to get a person who is skilled in analysis to do this. Too often, research is done only by engineers who already have a predetermined solution in mind.

An anti-pattern I have seen we have seen is only measuring seeped. Timing is not everything. The outcome of a process is more important. For example, for a code review, the code review might be short, but if a junior developer doesn’t learn anything or feel trodden on, then it isn’t an effective process. Or if we are deploying often, but the software isn’t being released to customers or it’s causing outages. Where possible, we want metrics that balance quality with speed. This would often be a combination of different types of metrics.

I often see teams, applying too much emphsiss to a metric, not really treating it like a signal or an indicators that needs interpretation.

Frequently, our preconceived notions about what hinders team productivity prove to be misguided. We might fixate on trendy methodologies (microservices, we’re looking at you) or specific tools when what’s truly needed is a robust engineering culture that encourages ideas to flourish within the teams.

. Engineering Metrics can also be misleading, there is still a lot of contextual differences, there might be very different technology choices, a team working on a monolith will have different metrics to finely grained microservices, and some product teams might be wrestling with technical debt. Some teams can benefit from A/B test to quickly get customer feedback, others are more revolutionary that requires a different validation approach.

Some metrics are associated with engineering best practices that we applied across the organization, such as frequent check-ins, unit test coverage and fast code reviews. We should look for diversions from the best practice, but this doesn’t indicate productivity, just that the developer is following a different workflow or doesn’t understand the expectations.

If we take two examples

Search Tech Stack: Technical Requirements: Product Goals: