Firetiger vs Sentry

Sentry is the standard for catching, grouping, and surfacing application exceptions with full stack traces. Firetiger sits in a different layer: it watches each PR's deploy against a monitoring plan derived from the diff and produces a per-deploy verdict on whether the change behaved as expected. Sentry tells you that errors are occurring; Firetiger tells you whether a recent change is responsible and which one.

Why it matters

Sentry and Firetiger solve different parts of the same problem. Across teams Firetiger has worked with, roughly two-thirds of post-deploy regressions never raise an exception that Sentry can see — silent data quality issues, performance regressions, behavioral changes, business-metric drops. The two pair naturally: Sentry remains the right tool for the exception-raising bugs, and Firetiger fills in the regression shapes Sentry's model structurally misses. Adopting Firetiger does not change the Sentry footprint, and Sentry data can feed Firetiger's monitoring plans as a signal among many.

This article walks through what Sentry is great at, where the gap remains, how Firetiger fits next to it, and when teams need both.

What Sentry is great at

Sentry has spent more than a decade refining the error tracking experience, and the result is one of the more mature developer-facing products in production tooling.

Exception capture across the stack. Sentry SDKs exist for essentially every language and framework worth supporting, and the capture surface — uncaught exceptions, unhandled rejections, error boundaries, custom exception reports — is exhaustive. For frontend code in particular, where errors are otherwise easy to lose, Sentry is the industry default.

Grouping and deduplication. A single bug can produce thousands of error events. Sentry groups them by signature so engineers see one row per bug rather than a flood of duplicates. The grouping logic is well-tuned and configurable, which is what makes the inbox usable at scale.

Rich error context. Stack traces, breadcrumbs, request details, user context, environment, release tag, and (where source maps are available) original source. Engineers triaging an error in Sentry have a strong starting point compared to a bare server log.

Release tracking. Sentry's release model lets the team associate errors with the deploy that introduced them, and the regression markers (first seen in release X, regressed in release Y) are useful when they work. The model depends on correctly tagging errors with releases, which is straightforward in most environments but not free.

Performance and replay layers. Beyond error tracking, Sentry offers performance monitoring and session replay. These are real but secondary capabilities; the core product strength is errors.

For most teams, Sentry is the right answer to "where do application exceptions land?", and the case for staying on Sentry rarely needs to be re-made once it is integrated.

Where the gap remains

Error tracking is a specific layer with specific limits. Sentry is excellent at what it does, but several adjacent problems are outside its model.

Not every regression is an exception. Many of the worst post-deploy regressions do not raise exceptions at all. A change that silently returns wrong data, a query that becomes 10x slower, an API endpoint that starts returning 422 instead of 200, a feature flag that stops being honored, a background job that no longer fires — none of these necessarily produce Sentry events. The application "succeeds" from its own perspective. Sentry never sees the regression. The first signal comes from customer complaints or external monitoring.

Change attribution is approximate. Sentry's release model can say that an error first appeared in release X, but the granularity is the release, not the specific change inside it. A release that bundles five PRs gives Sentry one anchor; the team still has to figure out which of the five PRs is the cause. That work — reading the diffs, reasoning about which code path could produce the observed exception — is the same diagnostic work that consumes most of an incident's wall-clock time.

Intent verification is not the model. Sentry can tell you that no new errors appeared after a deploy. It cannot tell you whether the change did what it was supposed to do. A deploy that was intended to improve checkout success rate and instead left it flat is a silent failure; Sentry has nothing to flag, because nothing threw.

Static thresholds dominate the alerting model. Sentry alerts on error frequency and on new issues; both are useful, both are change-agnostic. There is no per-PR baseline that distinguishes "this PR introduced 50 errors per hour, which is 50 more than its predecessors" from "this service has always had 50 errors per hour from this kind of code." The alerting model is symptom-relative, not change-relative.

No verdict per change. Sentry produces an error inbox. It does not produce, for each deploy, a structured "verified" or "regression detected" outcome that lands on the PR. The state of a deploy is implicit in whether new issues appeared; the team has to draw the conclusion.

Again, none of this is a defect in Sentry. Error tracking is a coherent, useful category. It is just narrower than "everything bad that can happen after a deploy."

How Firetiger differs

Firetiger is built around the change event, not the error event.

For each PR, Firetiger reads the diff and description, generates a monitoring plan that names the signals the change should affect (and the signals it should not), watches the deploy across environments, and produces a per-deploy verdict. The signals can include errors (from Sentry, from application logs, from observability platforms), but they also include latency percentiles, success rates, database performance, custom business metrics, and any other signal the monitoring plan considers material.

Because the verdict is anchored to the PR, attribution is structural: when a regression is detected, the surface that triggers — PR comment, Slack message, incident timeline — names the specific change responsible, the change author, the affected scope, and the supporting evidence. Engineers do not have to reconstruct which PR caused which symptom; the verdict carries that information by construction.

When Sentry is in the stack, Firetiger reads from it. A new exception class that appears after a deploy and matches the changed code path is exactly the kind of signal a Firetiger monitoring plan would watch for. Firetiger amplifies Sentry by interpreting Sentry's signal in the context of the specific change.

When to use both

Most teams using Firetiger also use Sentry. The two are complementary.

Sentry for the error inbox; Firetiger for the deploy verdict. Engineers triage errors in Sentry the way they always have. After each deploy, they receive a Firetiger verdict on whether the change worked. The two interfaces serve different daily workflows.

Sentry as a signal source. Firetiger's monitoring plan can include Sentry data — new exception groups appearing in the changed code path, error volume regressions in specific releases. The verification reads Sentry rather than replacing it.

Sentry for the bugs that are exceptions; Firetiger for the regressions that are not. Many regressions never raise exceptions. Firetiger's plan watches success rates, latency, and behavioral signals that Sentry's model does not see. Sentry remains the right tool for the bugs that do raise exceptions.

Sentry for general engineering; Firetiger for the release loop. Engineers use Sentry continuously to fix bugs. Firetiger fires per deploy with a verdict that has a clear next action: roll back, fix, or proceed. The two have different cadences in the team's day.

The integration is additive. Adopting Firetiger does not change the Sentry footprint and does not remove value from Sentry; the value of both goes up when wired together.

When to evaluate Firetiger first

Sentry is foundational for exception-raising bugs and rarely needs to be re-justified. The question is when, with Sentry in place, deploy verification becomes worth evaluating.

A few signals:

Regressions keep slipping through Sentry without surfacing. If the team has experienced incidents that started after a deploy and did not raise any exceptions — silent data quality issues, performance regressions, behavioral changes — Sentry's model structurally cannot catch them. Verification is the missing layer.

Deploys are frequent and the "which PR?" question dominates incidents. Sentry can narrow the cause to a release. If the team's releases bundle multiple PRs, the diagnostic work of identifying the specific PR is still in front of them. Verification produces per-PR attribution directly.

AI-assisted development is increasing PR volume. More PRs per release, faster cadence, more variation per change. The release-level granularity Sentry provides gets coarser as the PR-per-release ratio rises. See What is PR-based monitoring?.

Change failure rate is being measured manually. Teams that calculate CFR from Sentry's regression markers usually know the number is partial — only the changes that broke something visible in Sentry are counted. Verification produces a per-deploy verdict from a broader signal set, which makes CFR measurement more honest. See What is change failure rate?.

For teams where Sentry catches most production bugs eventually but rarely catches them during the release loop, verification is the layer that closes the timing gap.

See Firetiger in production

Read how Town keeps AI assistants running with Firetiger — Town's founding engineer on cutting through alert noise, shrinking debugging from days to minutes, and giving coding agents the context they need to do their best work. More teams use Firetiger this way at /case-studies.

Where to start

Audit your last ten post-deploy incidents. For each, ask: did Sentry surface this? If so, how long after the deploy? If not, what kind of signal would have? The answers map the territory verification has to cover that Sentry does not.
Keep Sentry as the exception inbox. Verification does not replace the inbox; it sits alongside it. Don't tear out the error tracking when adding deploy verification.
Wire deploy events into both. Both Sentry's release tracking and Firetiger's verification require clean deploy events. Most teams discover, on evaluation, that the deploy event source needs work before either tool can do its best work.
Run a per-PR pilot. A two-to-four-week pilot of deploy verification on a high-frequency service produces real verdicts and surfaces real signal-quality differences against Sentry's existing coverage. See How to evaluate deploy verification tools.