A few weeks ago—before Anthropic's code security announcement tanked the market—I was lucky enough to be at a private event with Anthropic's Deputy CISO Jason Clinton and the ex-OpenAI CISO Matt Knight (who built OpenAI's vulnerability detection product Aardvark, now Codex Security). The discussion was not just on "what is the new AppSec product" but "will there be more or fewer security engineers in the future?"
Security, already tilted massively towards offense, has a fundamental asymmetry made worse by the probabilistic nature of vulnerability discovery. If we have 10,000 unknown vulnerabilities and spend $1M to find 100 of them, and an adversary spends $1M to find 100, we're not going to find the same 100 vulnerabilities. The attacker will always get their hands on a vulnerability we don't know about.
And we are headed into an unprecedented storm of vulnerabilities: it was always pretty cheap to find them, but there was a talent bottleneck. Now the foundation models are enabling unprecedented vulnerability finding: Anthropic announced 500 "high-severity" findings enabled by Opus 4.6 with no specialized tool use. Actually more impressive to me is that independent researchers (with far less resources than Anthropic has) are following up with unprecedented vulnerability discovery rates.
Will the foundation models solve this themselves? Will this storm intensify or resolve in the next few years? Some vendors in the AppSec space point derogatorily to the fact that today, foundation models generate code that is worse from a security perspective. But I believe they can eventually generate code that is more secure on average than today's median developer. In fact, Meta and OpenAI are already starting to use Semgrep to tune the output of models!
Still, even if a model gets 2x better than an average developer, the amount of code being generated is at a volume (say 10x) that the overall final output could have 5x more vulnerabilities. Hence the need for the foundation providers themselves to provide some basic capabilities to try to reduce vulnerabilities.
Still, no one is really sure exactly how to gauge performance in this new world. Will models actually end up being 100x better than the average engineer at writing secure code, thus overcoming the volume issue? Or will it be true that "we cannot solve our problems with the same thinking we used to create them" and an adversarial engine or agent must scrutinize the output? If you look at what companies like Anthropic and OpenAI are announcing, they are betting on the latter so far, which (ironically) means there is plenty of space for independent security vendors to compete with them. And also good news for anyone looking for a security engineer job: there will be plenty of security issues to deal with if the cost of generating code approaches zero.
Claude Code Security: the end of AppSec solutions?
Some others in the AppSec space have thrown shade at Anthropic's Claude Code Security announcement. Here’s my rebuttal to the skeptics:
“LLMs aren't deterministic” — Neither is a human security reviewer. We already use many non-deterministic tools in security. It’s a con, but not a reason it won’t be adopted transformatively.
“OpenAI announced Aardvark last year and nothing has changed.” — Just like with code generation, change will be slow and then all at once.
"LLMs can't write secure code" — True today, but it won’t be tomorrow. Foundation model companies are already using Semgrep to tune their outputs. Many low-hanging fruit vulnerability classes will be entirely eliminated.
I'm firmly convinced LLMs are the future of security, and an accelerant to us. We've designed the next generation of Semgrep architecture to get better as foundation models get better.
Where does Semgrep fit in?
A good percentage of the buyers for code scanning today are just looking to check the box. They have good free options (open source Semgrep!), built-in platform level solutions (e.g., GitHub Advanced Security) and now foundation models, too.
Semgrep's original product market fit was around custom rules: if you were bitten by a vulnerability in the past, root cause it and find all the variations in your code base that could lead to it happening again. We've achieved incredible popularity as a result: we are the most popular open-source scanner by far.
But Semgrep 1.0 was for humans. The next version of Semgrep is for agents.
We've been decomposing Semgrep into smaller program analysis units that LLMs can use to ask and answer program analysis questions. As we announced last week at our keynote, that has turned into a new product: Semgrep Multimodal. In our benchmarks, it has 8x the true positives vs. a base foundation model alone can find. With 50% fewer false positives!
Today, we see the models as a very energetic but junior pentester, who performs best when instructed "only come back to me when you can prove your exploit works – not just you think you have a vulnerability." Combining that persistent enthusiasm with deterministic tools is very high leverage: cheaper, faster, and high quality results.
Semgrep Multimodal is actually just one of many new products coming built on top of a customizable Semgrep Workflow capability. That allows LLMs to use the Pro Engine, Pro rules, and some unannounced new tools and data sources to function as a highly effective virtual security engineer. Custom workflows are going to become the new custom Semgrep rules. All our core workflows will be driven by agents; you'll no longer use Semgrep directly; you will interact with it through an agent.
Semgrep Multimodal and Workflows have already found dozens of incident level 0-days for customers. LLMs + great tools is the future; we believe it is an accelerant to fulfilling our mission of making it expensive to exploit software.
The new version of Semgrep is for agents! We'd love for you to try it out and hear your feedback.