AppSec ROI for Engineering Teams | Semgrep

The sprint is already full and underway. The backlog has three critical security findings that have been "in progress" for six weeks.

Security tooling is an investment, and for the time and energy expended adopting tools it must produce dramatically higher fix rates and less total engineering time spent on security to justify it. The pressure to ship isn’t going away so the question is whether the security you do have time to do is effective. Turns out, unfortunately for most companies, the ROI is ambiguous. Sigh. It doesn’t have to be that way though and making small adjustments to engineering workflows can have outsized impact.

Semgrep’s analysis of anonymized remediation patterns across 50,000 active repositories over the course of a year delivers some insights. Five patterns adopted by Leaders (companies who are among the top 15% by fix rate) stood out from everybody else. To be among the top performing cohort, you want to be optimizing: timing of when a finding surfaces, improved signal quality, tool consolidation, reachability precision, and how quickly engineering teams act.

Interruption Tax: 4.8 days vs. 43 days

When a Static Application Security Testing (SAST) finding surfaces in a pull request, the leading teams resolve it in 4.8 days on average. The same class of vulnerabilities, caught in a full scan after the sprint closes and code ships, is 43 days.

This 9x difference isn't because PR findings are any less complex to fix. It's a context switching problem to refresh the memory of the solution and how it works in the code:

Get assigned a ticket for a codebase you may not have written or was so long ago it might as well have been somebody else
Rebuild context around unfamiliar logic
Implement a fix that competes for attention with new sprint goals
Coordinate a code review that requires another person to understand a months-old decision

When code security findings surface in a PR, you should streamline your code quality workflow. The developer of the code still has the context in their prefrontal cortex. The security bug can be resolved immediately, and with capabilities like Semgrep Autofix’s automated code remediation suggestions, it becomes far easier to build reliable code from the get-go.

Having pride in one's craftsmanship of code being correct is a strong motivator. The ROI of PR scanning is recovered engineering time that otherwise shows up later as technical debt. Remediation time is important for security team KPIs, but a growing backlog of technical debt limits the ability for a tech lead to stay in control of their project.

Signal Quality: 2.4x Fix Rate Gap

The gap between leaders and the field is not about tools, prioritization frameworks, or severity filters. Every organization in our dataset has access to the same scanning capabilities, the difference is in how they used them. Leaders fix 39.6% of SAST findings while everybody else averages 16.8% (a 2.4x advantage).

For engineering teams, this matters because the typical response to a low fix rate is "prioritize better". Dev teams will then go through a cycle of tightening severity filters, triage meetings, enforcing stricter SLAs around remediation, etc.

This wastes development time because it's the wrong tactic. Organizations that have higher fix-rate aren’t just trying harder, they’re fixing the findings earlier and prioritizing critical issues that can be acted upon. This is a pattern seen in many Proof of Value (POV) evaluations for Semgrep, once tuned with specific, contextualized results that are attached to PR workflows get fixed while eliminating many false positives.

By customizing rules and policies to only report reliable and consistent true positive results, engineering teams save overhead in discussing organization-wide trends where the metrics look bad.

Tool Sprawl: 20% of Repositories Use 2+ Languages

Engineering teams rarely write in one language. Analysis of 448,000+ repositories shows that 20% use two or more scripting languages within the same repository, with trilingual and polyglot repos extremely common in larger organization codebases and monorepos.

For many teams, that could mean a separate best of breed code scanner for each language: Bandit for Python, gosec for Go, npm audit or an eslint security plugin for JavaScript, Brakeman for Ruby, etc. Each tool brings its own configuration approach, CI integration, alert schema, false positive rate, output parsing logic, and noise suppression annotation policies creating an uneven development experience across the engineering organization. The best practices learned in one development workflow then need to be adapted to n more for each development environment.

This maintenance cost compounds the tools to track and update, rules and policies to calibrate, output to standardize for reporting in dashboards, etc. While the appeal of DIY is flexibility and control, the consequence may be fragmentation and inconsistency.

A single engine across the full language stack eliminates coordination overhead: one config, one integration, one alert format, one suppression syntax. Leading teams resolved cross-file findings 69.4% of the time through better data flow analysis.

Reachability Reliability: +25% Fix Rate

When looking at third-party dependencies, leaders fix 93.2% of critical-severity Software Composition Analysis (SCA) findings whereas the rest of the field falls below 33.8%. That’s a huge gap and a 25% higher fix rate can be attributed to reachability analysis where the code scanner traces the code path into third party dependencies from open source.

There are other tools like npm audit which will identify and report on CVEs at the dependency version level, but that will include many matches that may not put code at risk. If it isn’t clear if a dependency is exploitable, or if there are too many dependencies identified, or if the consequence of updating the dependency requires verification that the code will still maintain its functional requirements can lead to procrastination.

Prioritization has an important role in fixing security issues in the same way any other quality assurance workflow is important. Reachability analysis directly addresses the noise problem while keeping SCA workflows independent from SAST workflows. Given the amount of required effort needed with each is different so should your approach diverge. One requires coordination to make a software update to a library where the other requires more knowledge of the logic of the function calls.

Delay Decay: After the First 90 Days

Past 90 days, security findings stop being just risks and start becoming research projects too. The longer issues sit, the less likely remediation is to fix them. This is not too surprising because code authorship gets murky, surrounding code has shifted, and identifying a fix within unfamiliar codebases even by those who wrote it months before becomes an archaeological dig to surface the context needed to address it. That might take just as long as the original sprint took to build the capabilities to begin with.

The 90-day mark is not technical but organizational, the percentage of security issues beyond one quarter make a large step decrease in success for ever being resolved. Leading teams resolve issues under the threshold and the rest of the field gets stuck in a quagmire of unresolved issues, spending proportionally more effort clearing old debt than addressing new security issues being introduced.

This doesn’t mean you can declare findings bankruptcy and clear the queue, but in terms of prioritizing your ROI you should hold the line and remediate findings that are freshly identified.

(1) Remediate: assign it, allocate time, ship the fix as soon as possible on the newest code shipping.

(2) Mute by putting in policies to clear the backlog of known false positive, low signal rules, and non-production code paths. A justification of acceptable risk acknowledges that sometimes it's too costly to fix or upgrade a package with breaking changes.

What doesn't work is leaving issues open as a hedge. An undecided finding in the backlog is noise, it is deferred risk that has compounding interest to the effort required. A n-day escalation policy can be used to recover time by forcing security defects to be addressed before kicking the can down the road.

What To Do to Maximize Your ROI

This article shared a fair amount of data about how leading companies perform, but what are you to do about it? Here’s a few first steps to work on.

Choose a tool based on language coverage. Semgrep covers 30+ languages with hundreds of high signal rules to precisely identify priority problems. Use this to your advantage. DIY with a bunch of free and open source tools combined with LLMs is cheap to get started but expensive to maintain.
Shift left with an IDE plugin or at the very least at merge time. The Shift Left mantra is a well worn cliche when everybody preaches it, but the numbers do back it up. Investing time in setting up IDE plugins and CI/CD systems has outsized returns on finding security findings. It is more efficient overall to deal with them early.

A successful coach, John Wooden, once said “If you don’t have time to do it right, when will you have time to do it over.” Whether applied to sports, life, or code security; fixing issues as soon as possible with a tool that is reliable, deterministic, and comprehensive will deliver the most successful results.

The full benchmark data, cohort methodology, and ecosystem breakdowns are in the Remediation at Scale report — worth a read if you're building the case internally.

Security that Ships: The ROI of AppSec for Engineering Teams

Interruption Tax: 4.8 days vs. 43 days

Signal Quality: 2.4x Fix Rate Gap

Tool Sprawl: 20% of Repositories Use 2+ Languages

Reachability Reliability: +25% Fix Rate

Delay Decay: After the First 90 Days

What To Do to Maximize Your ROI

Dive deeper into Application Security or continue reading our featured posts.

Announcing Pyro Caml: A Continuous Profiler for OCaml

Mythos: Bad Takes, Facts, and Fear

Introducing Semgrep Custom Workflows