AI security capabilities and the human side of vulnerability management

Share
AI security capabilities and the human side of vulnerability management

Mythos, oh Mythos. The whole web started to panic, leadership started to care about security... "good", but not really because this is fear, not real interest in securing their clients' data and environments. Mythos came out hard, finding vulnerabilities from 20+ years, because they required context of multiple users or complex workflows where AI is good at identifying patterns with a Large Language Model.

The panic is that organizations are going to find hundreds and thousands more vulnerabilities, and attackers will find those too.

The landscape is changing, but there is no reason panic.

The reality is that, yes, this might require companies to invest in security, and to invest more in real security, and not just compliance! Compliance helps move things at the leadership and political level, but it does not secure your organization.

Now, what does Mythos really change for the average company?

Nothing.

More issues will be found, more patches will need to be applied. Why is this not an issue? Because this process already exists, it will only go faster with AI.

Attackers will find more security issues, and defenders will also use AI to detect and block more attacks, things will level up. But yes, it's an acceleration, an exponential one, not a new attack vector! Well, actually, AI is a new attack vector if you ask yourself "which AI agent has access to and does it have authorization to do X and Y"... but that's a different subject.

So what does this really change for the average or even large company? It will require faster decisions and better comprehension of security risk. Having technical CISOs, not just managers with an MBA. Security leaders who can explain the risk so the business moves faster.

The paradox - Machines find bugs faster, organizations fix them slower

A great example is this blog post from my previous team at DoorDash. It took more than 3 years to see this project comes to life. This is the real danger to organization. Speed!

Security teams are fighting against the speed of leadership. AI is so fast. We can't follow through on all new features, so imagine non-technical leadership! The issue lies in convincing the executive team, convincing the engineering leadership, gathering a budget, developing or implementing a solution, deploying the solution, testing the solution, etc. AI won't remove any of those steps; it actually might make them more complex over time.

While researchers and attackers will find novel ways to create new bypasses and new vulnerabilities to exploit... your leadership will be thinking about what they should do, when, and how. And it might be too late.

Weirdly, this post is similar to my previous blog post about the thesis that fraud and application security dysfunction originate from organizational silos, not technical gaps.

The same concept applies to AI: machines can now surface vulnerabilities faster than any human, but team in silos, the political, organizational, and governance machinery required to actually reduce risk remains stubbornly, irreducibly human. And for organizations to save money or themselves from a breach, speed and understanding of the business risks is where things will be.

Anthropic and other companies can create FUD as much as they want, but it's usually a sales tactic. And it seems similar today.

And adding to the paradox, AI-generated code itself introduces vulnerabilities. Veracode's 2025 GenAI Code Security Report found that 45% of AI-generated code samples failed security tests, with Java showing a 72% security failure rate. A large-scale study found 62% of 330,000+ C programs generated by LLMs contained at least one vulnerability.

Subscribe to Security Autopsy

The past

Let's not forget that Mythos might be fast and great to contextualize an application source code, but that's not completely new.

Google's Big Sleep agent (a collaboration between Project Zero and DeepMind using Gemini 1.5 Pro) achieved this first. In October 2024, it found an exploitable stack buffer underflow in SQLite that their existing fuzzing infrastructure - Google's own OSS-Fuzz - had not caught, marking the first publicly documented case of an AI agent discovering an unknown exploitable memory-safety issue in widely deployed software.

DARPA's AI Cyber Challenge (AIxCC) provided the most rigorous competitive benchmark. At the August 2025 finals at DEF CON 33, seven finalist Cyber Reasoning Systems scanned 54 million lines of code across real open-source projects including Jenkins, the Linux kernel, Nginx, and SQLite3. The systems identified 86% of synthetic vulnerabilities (up from 37% at semifinals), patched 68% of those identified, and discovered 18 real-world vulnerabilities that DARPA had not planted, submitting 11 viable patches.

The AI bug-hunting startup XBOW reached #1 on HackerOne's U.S. leaderboard in Q2 2025, submitting approximately 1,060 vulnerability reports (54 critical, 242 high) and completing 104 real-world security challenges in 28 minutes versus a human pentester's 40 hours, an 85x speed advantage. The open-source tool Vulnhuntr, powered by Claude, found over a dozen zero-day vulnerabilities in popular GitHub projects within hours.

The statistics

While it might be a sales tactic, the technical results about Mythos are great and will improve application security in the mid to long term.

AI has crossed a decisive threshold in vulnerability discovery. In early 2026, Anthropic's Claude found 22 CVEs in Firefox in two weeks, Google's Big Sleep agent foiled an active exploit before attackers could use it, and DARPA's AI Cyber Challenge saw machines identify 86% of planted vulnerabilities and patch 68% of them — at an average cost of $152 per task. Yet organizations still take an average of 252 days to fix known security flaws, 82% carry unresolved security debt, and 60% of all breaches still involve the human element.

This is impressive and totally something we would expect from a better and better machine learning LLM systems.

The main reason is, the AI capabilities over the past 18 months are not incremental, they're exponential, and they're changing the dynamics and the way things will work.

The cost

The issue that most are not talking. AI might be fast and great at finding issues but it comes at a cost, that is not really cheaper than hiring a human if... you have a data center full of GPU like Anthropic or hundreds of thousands of dollars in tokens !

Per-campaign cost figures Anthropic published (these are official, not estimates):

OpenBSD TCP SACK
<$20,000 / ~1,000 runs
27-year-old DoS via 2 packets
The specific run that surfaced the bug cost <$50 — Anthropic notes this only makes sense in hindsight
FFmpeg H.264 codec
~$10,000 / several hundred runs
16-year-old bug fuzzers had hit 5M times without triggering
FreeBSD NFS RCE CVE-2026-4747
Several hours / hundreds of files
17-year unauthenticated remote root
Specific run cost <$50; fully autonomous discovery + exploit
Firefox 147 JS shell
Mythos 181× vs. Opus 4.6: 2×
Turning known bugs into exploits
A ~90× generation-over-generation jump
Linux kernel
Several thousand scans
LPE chains via race conditions + KASLR bypass
No total dollar figure given

The human side

Let's not forget the real change here. AI will find more issues, and it will eventually (soon) fix them too in a well-contextualized manner so the application or system will not crash.

But security jobs will remain, patch management will still exist, just AI-led, but the risk will move toward behaviour analysis. Some startups already started. They can identify who is doing what, when, in their business context. They build a threshold of what the user's average usage of Google Drive, email, Teams, etc. is, and then anything outside that normal behaviour can be flagged. So we will be able to better address Fraud, Insider risk, and more.

See vulnerability management as the accounting/admin work that can be easily automated. We will adapt, but the access management risk will still be human, agentic, and more. Physical security will remain, phishing will remain (for a while at least), and more.

The statistics

Social engineering is now the top attack vector, driving 36% of intrusions (Unit 42) and 60% of breaches (Verizon 2025 DBIR), with users falling for phishing in under 60 seconds and training showing no measurable effect on failure rates. AI is accelerating the problem as 82.6% of phishing emails now use AI, and AI-generated campaigns have surged 1,265%.

All this to say, don't panic, but start thinking about how you can really secure your organization after patch management might be automated... social engineering, insider threats, access management (human and agentic), fraud, bad business logic, ...

Read more