Anthropic Is Coming for the Cybersecurity Industry
Nearly half of all AI-generated code contains security vulnerabilities. Veracode tested over 100 LLMs across 80 coding tasks and found a 45% insecure code rate, with no improvement across newer or larger models. A Tenzai assessment found 69 vulnerabilities across 15 applications built with vibe coding tools, including half a dozen critical. NYU and BaxBench research puts the range at 40% to 62%.
OpenAI saw this coming. In October 2025, they launched Aardvark, a GPT-5-powered autonomous agent that scans codebases, validates exploitability in sandboxed environments, and proposes patches. It has since been rebranded as Codex Security and expanded to include malware analysis. Now Anthropic has entered the same race with Claude Code Security. Within hours of the announcement, cybersecurity stocks dropped across the board. CrowdStrike fell nearly 7%. Okta dropped over 9%. The Global X Cybersecurity ETF hit its lowest point since November 2023.
The market overreacted. But the underlying signal connects to a problem the industry created for itself.
Not Another Static Analysis Tool
Both Codex Security and Claude Code Security make the same foundational bet: LLM-based reasoning about code beats rule-based pattern matching for finding the vulnerabilities that actually matter.
Traditional static analysis tools like SonarQube, Snyk, or Checkmarx work by matching code against a library of known vulnerability patterns. They catch exposed credentials, outdated encryption, common injection patterns. Useful, but fundamentally limited to what someone has already catalogued as a vulnerability pattern.
The new AI-native approach is different. Both tools read code and reason about it. They trace data flows across entire repositories, map how components interact, and identify vulnerabilities that emerge from the logic of how a system works rather than from matching a known signature. Business logic flaws, broken access control chains, authentication bypasses that only manifest when you understand how three different modules interact.
The implementations differ. Aardvark builds a threat model of the entire repository, then continuously monitors commits against that model. It validates findings by attempting to trigger vulnerabilities in an isolated sandbox, then generates patches via OpenAI Codex. Claude Code Security takes a multi-stage verification approach where Claude attempts to disprove its own findings before surfacing them, with validated issues appearing in a dashboard with severity ratings and confidence scores.
Both enforce human-in-the-loop remediation. Neither applies patches without developer approval. This is autonomous discovery, not autonomous patching.
500 Vulnerabilities That Decades of Expert Review Missed
Anthropic’s announcement builds on research published earlier this month. Using Claude Opus 4.6, their Frontier Red Team found over 500 high-severity vulnerabilities in production open-source codebases. Bugs that survived decades of expert review and millions of hours of fuzzer CPU time. OpenAI’s Aardvark has reported 10 CVEs from open-source scanning, with a 92% detection rate in benchmark testing. The scale difference is notable, though the benchmarks are not directly comparable.
The technical details of how Claude found these bugs illustrate why reasoning-based discovery is qualitatively different from existing tools.
Claude was scanning GhostScript, a widely-used PDF and PostScript processor. After initial approaches failed, it started reading the Git commit history. It found a security commit that added stack bounds checking for font handling, then reasoned that if this commit added bounds checking, the code before it was vulnerable. It searched for other code paths calling the same function and found one in a different file that lacked the same bounds check. From observation to proof-of-concept crash in a single session.
In the CGIF library for GIF processing, Claude identified that the LZW compression algorithm could produce output larger than its input under specific conditions, violating an implicit assumption in the code. Triggering this required a conceptual understanding of how LZW compression works, not just pattern matching against known exploit signatures. Traditional fuzzers struggle here because even 100% code coverage would not necessarily reveal the flaw. You need to understand what the code is trying to do, not just what paths it executes.
These are real bugs in real production code that real fuzzers running for years did not catch.
The Selloff Was Wrong, but the Fear Is Right
The cybersecurity industry has been built on the premise that finding vulnerabilities requires specialized tooling, proprietary rule databases, and deep domain expertise. Companies like Veracode, Checkmarx, Snyk, and SonarQube have built substantial businesses around static application security testing (SAST). Their products are embedded in CI/CD pipelines across the enterprise world.
Both Claude Code Security and Aardvark/Codex Security threaten the foundational assumption that vulnerability discovery requires purpose-built scanning infrastructure. If general-purpose frontier models can outperform specialized rule-based scanners at finding the vulnerabilities that actually get exploited, the defensibility of those specialized tools erodes fast.
Barclays analysts called the selloff “illogical” and argued Claude Code Security does not directly compete with the established businesses they cover. Technically correct in the short term. Claude Code Security is in limited research preview, available only to Enterprise and Team customers, restricted to code the customer owns. Aardvark is still in private beta. Neither is a GA product competing for CI/CD pipeline integration deals today.
But the analysts are missing the trajectory. The question is not whether these tools compete with Snyk today. The question is what happens when reasoning-based vulnerability discovery becomes a standard capability of AI coding platforms. Both OpenAI and Anthropic are on that path. And SiliconAngle noted that CI/CD integration is the obvious next step for both.
The Runtime Gap
StackHawk published a sharp response within hours of the Anthropic announcement. Their argument: Claude Code Security does not run your application. It cannot send requests through your API stack, test how authentication middleware chains together, or confirm whether a finding is actually exploitable in your runtime environment.
They have a point, though Aardvark partially addresses this with sandbox-based exploit validation. Static analysis, no matter how intelligent, can only reason about code as written. Runtime vulnerabilities only manifest when you test the running system. Dynamic application security testing (DAST) and runtime testing remain necessary layers.
The Anthropic examples lean heavily on buffer overflows, memory corruption, and data flow analysis. Real and important vulnerability classes, but not the same as business logic flaws that require understanding what a system is supposed to do in its operational context. Reading code carefully and understanding business intent are different things. Though if these models can reason about code well enough to find bugs that survived decades of expert review, the timeline for reasoning about running systems is measured in model generations, not decades.
Cleaning Up the Mess AI Coding Created
Nobody is saying this out loud: both OpenAI and Anthropic are now selling tools to find the vulnerabilities that their own coding tools introduce.
The Tenzai assessment is worth looking at more closely. They built the same three applications with Claude Code, OpenAI Codex, Cursor, Replit, and Devin. The tools were good at avoiding generic, well-known flaws like SQL injection and XSS. They consistently failed on business logic vulnerabilities where context matters.
That finding is telling. AI coding tools reproduce the patterns they were trained on. They avoid the mistakes that have been catalogued and documented for years. They introduce new ones that require understanding what a system is supposed to do, not just what syntactically valid code looks like. Tenzai’s researchers put it bluntly: AI agents are “very prone to business logic vulnerabilities” because they lack the common sense that human developers bring to understanding how workflows should operate.
Google and Microsoft both report that 20-30% of new code in active projects is now AI-generated. Stack Overflow’s 2024 survey found 63% of professional developers using AI in their workflow. Those numbers are only going up. The attack surface expansion is not theoretical. It is arithmetic.
Both companies sell AI coding tools. Those tools generate insecure code at a known, measurable rate. Now both sell tools to find the vulnerabilities their coding tools introduce. The incentives are aligned, even if the optics are circular.
The more interesting question is whether reasoning-based vulnerability discovery can catch the specific classes of bugs that AI coding tools create. The Tenzai data suggests the gap is exactly in business logic and context-dependent flaws. Both Claude Code Security and Aardvark claim reasoning about code logic rather than pattern matching as their core strength. If they can close that gap, they become the natural complement to AI-assisted development. If they cannot, they are just static analysis tools with better marketing.
What This Means for Enterprise Software
For enterprise software buyers and operators, the implications go beyond cybersecurity vendor disruption.
This is another instance of general-purpose AI capabilities displacing specialized point solutions. The same dynamic is playing out across enterprise software. Specialized tools that built their business around proprietary rule databases are exposed whenever a frontier model can replicate their core capability as a feature rather than a standalone product. When both OpenAI and Anthropic ship vulnerability scanning as a feature of their coding platforms, the standalone SAST market has a structural problem.
And the dual-use problem is real. The same capabilities that let defenders find vulnerabilities faster give attackers the same advantage. Anthropic addressed this with new detection probes that monitor for cybersecurity misuse, including potential real-time traffic blocking. They acknowledged this will create friction for legitimate security research. OpenAI has taken a different approach, updating its coordinated vulnerability disclosure process and no longer committing to strict disclosure timelines. There is no clean way to give defenders a tool that attackers cannot also use.
90-Day Disclosure Windows Won’t Survive This
This is becoming a feature war between AI platform providers, not a standalone product category. Claude Code Security is in limited research preview. Aardvark/Codex Security is in private beta. Both are headed toward broader availability. Both offer free access for open-source maintainers.
For enterprise teams evaluating their security toolchain: if you are already on one of these AI coding platforms, vulnerability scanning is becoming a bundled capability at effectively zero marginal tooling cost. If you are evaluating AI coding platforms, security scanning capability just became a differentiator.
For cybersecurity vendors, the question is whether your core value can be replicated as a feature of a general-purpose AI platform. If yes, the clock is ticking. If no, because your value depends on runtime context, operational integration, or domain-specific intelligence that models cannot yet replicate, you have time. Use it wisely.
The 90-day responsible disclosure window that has been industry standard for decades may not survive contact with AI-scale vulnerability discovery. When a model can find hundreds of high-severity bugs in a single scanning session, the entire coordinated disclosure process needs to evolve. OpenAI has already moved away from strict disclosure timelines. That is a governance problem, not a technology problem. And governance problems always take longer to solve.
Anthropic’s announcement: Making frontier cybersecurity capabilities available to defenders
Anthropic’s zero-day research: Evaluating and mitigating the growing risk of LLM-discovered 0-days
OpenAI’s Aardvark announcement: Introducing Aardvark: OpenAI’s agentic security researcher