Learning Center/DORA Metrics

What is lead time for changes?

Lead time for changes is the elapsed time from a code change being authored to that change running in production. It is one of the DORA four key metrics and is the most direct measure of an engineering organization's delivery pipeline speed. Elite performers deliver changes in under a day; low performers can take more than six months.

Lead time for changes captures how long it takes a single code change to traverse the full path from a developer's keyboard into production. Like deployment frequency, it is a velocity metric — but it measures the latency of the pipeline rather than its throughput. A team can have high deployment frequency and still have long lead times if changes spend days or weeks queued in review, blocked on CI, or waiting for a release window.

One way engineering leaders describe the metric in plain language: "how long does it take from a request coming in to it actually being in production?" That framing — request-to-production — is more useful than the dictionary definition because it forces a conversation about which parts of the path are slow.

DORA's State of DevOps research consistently finds that lead time bands separate elite performers (under one day) from high performers (one day to one week), medium performers (one week to one month), and low performers (one to six months). The bands span almost three orders of magnitude. Where a team sits in the distribution is a direct reflection of how much friction exists between authoring a change and getting it in front of users.

Where you start the clock changes the answer

The single most important decision in measuring lead time is which event starts the clock. Three options are common, and each one measures a different thing.

First commit on the branch. Starting the clock at the developer's first commit captures the full development cycle, including time spent iterating on the change before opening a PR. This is the most ambitious definition and the hardest to game, but it includes time during which the developer was actively iterating — which is not necessarily "lead time" in the pipeline sense.

PR opened. Starting the clock when the PR is opened captures the review-and-delivery cycle. This is closer to what most engineering leaders intuitively mean by "how long does it take to ship a change once it's ready for review." It excludes the time a developer spent drafting the change but includes the time review took.

PR merged. Starting the clock at merge captures only the delivery cycle — the time from a reviewed, approved, merged change to that change running in production. This is the cleanest measure of CI/CD pipeline speed, isolated from human factors. It is also the most common convention in modern DORA tooling.

There is no universally correct choice. The right answer depends on what the team is trying to learn. A team trying to improve review cycle time wants the PR-opened clock; a team trying to improve CI/CD throughput wants the PR-merged clock. The wrong move is to compare lead times across teams that started the clock at different events — that comparison is meaningless, even if the numbers look similar.

The build-and-test bottleneck

Engineering leaders consistently identify CI build and test time as the dominant component of lead time. A team can have a low review-cycle median, a fast deploy pipeline, and short queue times — and still be in DORA's "medium" band because building and testing each change takes 45 minutes.

This is one of the most useful diagnostics that lead time produces. Lead time itself is an outcome metric, not a target, but the decomposition of lead time tells the team where to invest. If 80% of the median lead time is build-and-test, the work is in test parallelization, selective testing, build caching, and CI infrastructure — not in code review processes. If 60% of lead time is review queue, the work is in reviewer load-balancing, code-ownership ergonomics, and PR size — not in CI.

A useful practice is to publish a median lead time histogram by phase: time-in-review, time-in-CI, time-in-canary, time-to-100%. The shape of the histogram tells the team where the largest single improvement is.

What gets gamed

The most common distortion is to count lead time only for the small, fast changes — the documentation typo PRs, the dependency bumps, the configuration nudges — and to either exclude or down-weight the larger changes. Median lead time across "all changes" is honest; median lead time across "changes the team chose to count" is not. Teams that report unusually low lead times often have a definition of "change" that excludes the slow ones.

A subtler distortion comes from teams that use merge-queue systems and report lead time as PR-merged-to-deploy. If the merge queue itself takes hours, that latency disappears from the metric even though it is real.

How Firetiger computes lead time

Firetiger reads each PR diff, generates a deployment-specific monitoring plan, watches the deployment across staging, canary, and production, detects regressions, and investigates root cause. Computing lead time falls out of two of the data flows Firetiger already maintains: GitHub webhooks deliver the PR lifecycle events (opened, merged) and the associated commits with timestamps; deploy webhooks deliver the production deployment event with its commit SHA. Joining the two gives the lead time per change with no extra configuration.

Service identity comes from trace tags (service.name, service.version, commit SHA) rather than customer-maintained YAML mapping repos to services. For monorepos, this means a single PR that touches three services produces three independently measured lead times, one per service, which is the view that matters for finding slow pipelines.

The clock-start choice — first commit, PR opened, or PR merged — is configurable per team because the right answer depends on what the team is trying to learn. The dashboard makes the choice explicit rather than hiding it.

Where to start

  • Decide which clock you are starting. First-commit, PR-opened, and PR-merged all measure different things. Write down which one you are using and why.
  • Decompose by phase. Median lead time as a single number is less actionable than median lead time broken into review, CI, and deploy phases. The decomposition tells you where the work is.
  • Measure the slow changes too. Excluding large or complex changes from the lead-time numerator produces a flattering number that does not reflect reality.
  • Read lead time alongside change failure rate. Reducing lead time by skipping verification is easy and reduces stability. The two metrics together protect against that tradeoff.

Firetiger uses AI agents to monitor production, investigate incidents, and optimize infrastructure — autonomously. Learn more about Firetiger, get started free, or install the Firetiger plugin for Claude or Cursor.