Through a Scanner Falsely: When AI-reported Critical Vulnerabilities Aren’t

Security folks love automation. We have to. The game is asymmetric; our teams are buried under alerts, dashboards, and ever expanding backlogs. Attackers aren’t. But we’ve learnt over the years, automation without context can quickly become harmful noise rather than helpful signal.

Recently, one of our internal experiments with an AI-based code reviewer surfaced what it confidently labeled as a “Host Header Injection: CRITICAL VULNERABILITY 🚨”

The explanation was thorough and the potential impact severe: OAuth hijacking, account takeover, the works. As a security engineer reading the analysis, my pulse definitely spiked. There was even an emoji, we’re cooked. Page the engineering team!

Except… it wasn’t actually an exploitable issue. False positives like this waste time, erode trust in our security tools, and, probably worse, risk burning credibility with my peers in engineering.

The Host Header Injection “Vulnerability”

Here’s the gist of the AI’s finding:

Code was written to check the Hostname from the header of incoming requests.
That value was used in a way that could influence redirection URIs in an OAuth flow.
Therefore, an attacker could control the redirect target and steal OAuth codes.

On paper, this looks damning. The AI even helpfully provided a curl command that would demonstrate the issue.

But dear reader, there was no major concern. Our infrastructure won’t accept requests with the wrong Host headers. Browsers and modern stacks don’t just blindly serve requests for unknown hosts. So while the code technically read as vulnerable, the exploit chain was dead on arrival. The semantic context was missing.

Vulnerable ≠ Exploitable

This is the teachable moment.

To be fair the code could be considered vulnerable to Host header injection, not unlike Open Redirect issues. But the way the OAuth protocol works, not to mention the surrounding ecosystem of load balancers, reverse proxies, and browser constraints, made exploitation impossible.

That distinction matters:

Vulnerable: the code has a weakness.
Exploitable: that weakness can actually be leveraged to cause harm.

A similar pattern is true of our software supply chain checks:

Vulnerable: the code has a weakness.
Reachable: we call that function in our code so it could cause harm.

AI analysis (and fair cop, sometimes SAST tools) can blur that line. They’re great at spotting “weak patterns” but blind to real-world context.

An experienced security engineer will quickly suspect that was the case in this example. Headers controllable by end users? Nice catch! Using it to hijack a user’s OAuth flow? Sounds suspect. But spare a thought for the software engineer who lands this scary looking issue in their ticket queue.

What is missing from this workflow was the ability to customize and tune the policies and rules for what is exploitable.

The Cost of False Positives in AI Security

This wasn’t just a funny observation about an overeager LLM. It’s a reminder of why many engineering teams grow frustrated with security tools:

Backlogs fill up with issues that don’t yield impact.
Engineers lose trust because of the signal-to-noise ratio.
Security burns credibility asking teams to “investigate” things that don’t pose a risk.

If your developers are rolling their eyes at yet another “CRITICAL 🚨” ticket, your security program is losing ground.

What Security Leaders Should Take Away

Defense in Depth Still Wins
Yes, you should validate browser-provided Host headers. Yes, you should reject unrecognized hosts. But not every weakness needs to be a fire drill if the surrounding layers already protect you. Context here matters when setting rules and policies.
Integrate, Don’t Backlog
The real promise of AI in security isn’t dumping alerts into a ticketing system. It’s surfacing “belt and suspenders” fixes inline where developers can make the fix immediately while in flow. Taking off my skeptical security hat, this is the thing that excites me the most. Imagine fixing security issues (big and small) without a findings backlog and loop through engineering’s priority stack.
Teach the Difference
Customize your tools with context and policies. Tuning out false positives can be achieved only through just that – tuning the system on what is and isn’t above the cut line.

Closing Thoughts

Building security tools have taught a lot about alert fatigue and I feel AI-based tools are at a fun inflection point. Are they going to be a security panacea or just another channel trying to manipulate us with “🚨🚨“ emojis?

For the time being, I am prompting my little LLM mate to stick to surfacing code facts and suggesting ideas for what other context needs to be checked. I am shying away from piping its output to my peers in engineering. They have enough emoji-driven tasks in their day-to-day tasks.

Semgrep has customizable rules and policies to tune findings with AI Memories that get better through iteration. Teams running the Semgrep MCP server within their IDE are able to interactively discover and remediate potential issues prior to committing issues without much time investment.

Learn more: https://semgrep.dev/solutions/secure-vibe-coding/

Through a Scanner Falsely: When AI-reported Critical Vulnerabilities Aren’t

Share

The Host Header Injection “Vulnerability”

Vulnerable ≠ Exploitable

The Cost of False Positives in AI Security

What Security Leaders Should Take Away

Closing Thoughts

About

Dive deeper into Application Security or continue reading our featured posts.

Can LLMs Detect IDORs? Understanding the Boundaries of AI Reasoning

How Semgrep & StackHawk Help AppSec Teams Prioritize Real Risks

Finding vulnerabilities in modern web apps using Claude Code and OpenAI Codex

Find and fix the issues that matter before build time