Zero Trust vs. MCP: Why "Verified" Isn't Safe

Tova Dvorin (00:01.612) Welcome back to the Cyber Resilience Brief, a Safe Breach podcast. I’m Tova Dvorin your host, and as always, I’m joined by Adrian Culley offensive cybersecurity engineer here at Safe Breach. Adrian, today’s question is the one almost every CISO has asked us in the last six months. The Frontier models, Claude Opus 4.6, the Mythos Preview, ChatGPT 5.5, they now have serious offensive cyber capability, they have now have serious offensive cyber capability baked in. So the question is, has any of that actually moved to the numbers? Are we seeing more zero days, faster exploits, slower patching because of AI? Or is the headline running ahead of the data? Adrian (00:38.034) Tova, it’s the right question and I want to set a date stamp before we start because of how quickly things are moving. Everything we say today right now on this recording is as of mid-May 2026. That matters because the most important sources for this episode landed in the last 12 weeks. Google threat intelligence groups 2025 zero day review dropped in March. Mandiant’s M-trends 2026 in March. Rapid7’s 2026 global threat landscape report just after. And the GTIG report on adversaries using AI for vulnerability exploitation came out on just the 11th of May. So this is a very current picture. Tova Dvorin (01:16.704) Indeed. Now before we get into the data, let’s let’s talk about Mythos. Some of our listeners will know that name, some won’t. Let’s do a quick clarification. Adrian (01:25.01) Claude Methos Preview is Anthropix’s internal but publicly evaluated preview model. It’s the one UK AI Safety Institute published a cyber capability evaluation of earlier this year alongside their evaluation of GPT-55. So when we say Methos today we mean a real frontier model that government evaluators have publicly measured. Not marketing. The findings are on the aisi.gov.uk website. Tova Dvorin (01:57.376) Okay, thanks for filling us in. Now let’s start with capabilities. What has actually arrived in the model class itself? Because the vendor admissions in the last few months have been remarkable. Adrian (02:08.306) There have been, let’s start with DARPA. The AI Cyber Challenge final run at DEFCON 33 last August 25. The cyber reasoning systems found 54 of the 63 synthetic vulnerabilities, generated patches for 43 of them. And on top of that discovered 18 previously unknown real bugs in open source code bases, 16 C, 12 in Java. Tova Dvorin (02:34.028) And the prize money here was real. Adrian (02:36.626) Team Atlanta won 4 million US dollars, Trail of Bits Buttercup took 3 million for second place, Theoreye got 1.5 million for third, but the bigger move was the license. All seven finalist cyber-reasing systems were open-sourced under an OSI-approved license. That permanently lowered the floor of autonomous vulnerability finding tooling globally. Tova Dvorin (03:01.784) So what we’re talking about here has been the lab scale baseline, but what about the production level frontier models? Adrian (03:08.06) So the Anthropic Claude Opus 4.6 system card published in February is the document I keep going back to. Anthropic state explicitly that the model has saturated their cyberbench part mark stack, roughly 100 % on Cybench at pass at 30, 66 % on Cyber Gym at pass at one. And they say in their own words that the benchmark stack, hang on my, I’ve got to change pages one second. And they say that the benchmark stack can no longer be used to track capability progression. The vendor is publicly admitting it has lost its ability to ceiling check its own model on offensive cyber. Tova Dvorin (03:50.316) That’s not terrifying. Did they do anything about it? Adrian (03:53.916) Well, two months later, still in the April this year, they shipped Claude Opus 4.7 with cyber capability in their phrasing, differentially reduced during training. And they launched something called the Cyber Verification Program to selectively reopen access for vetted researchers and red teamers. That’s the first time to my knowledge that a Frontier vendor has dialed back offensive cyber capability at the training stage rather than just filtering it at inference. OpenAI did something similar. GPT-55’s system card classifies the model as high capability in cybersecurity under the preparedness framework, short of critical, and they ship a separate GPT-55 cyber variant with trust-based access. Tova Dvorin (04:42.359) Right, and the UK’s AISI is the independent voice on this. Adrian (04:46.161) Yes, and they’re worth quoting Tova because they don’t sell models. AISI rates GPT-55 as their strongest model on narrow cyber tasks, 90.5 % on Passat 5. At their expert difficulty cyber range, they have GPT-55 averaging 71.4 % and Claude Methos Preview at 68.6%. That’s statistical parity at the frontier. Tova Dvorin (05:13.867) And the end-to-end simulation they built, the headline finding from AISI was the cyber range. Adrian (05:19.575) The last ones. 32 steps, 20 human hours of expert effort, initial reconnaissance, call it T1595 active scanning, through privilege escalation, lateral movement on T1021 remote services, all the way to full domain takeover. An early METHOS checkpoint was the first model ever to solve it end to end. A newer METHOS solves it six times out of 10 attempts. GPT-55 does three out of 10. Tova Dvorin (05:48.875) What about trend lines? Adrian (05:50.684) Well, AI science progression analysis estimated that 80 % reliability cyber time horizon was doubling every four and a bit months between late 2024 and February this year. These latest models have substantially exceeded that. The capability curve isn’t bending, it’s exponentially pointing up. Tova Dvorin (06:10.871) So in other words, it’s not just hype. The capability is real. It’s at the frontier. It’s the bleeding edge of cybersecurity and AI technologies right now. And the vendors and the regulators agree on roughly where it sits. Now let’s get to the question that listeners care about. Is any of this actually changing the zero day curve in practice? Or are we just talking a lot about it? Adrian (06:29.649) So this is where I want to be careful and detailed because the data has a different shape than the headline suggests. GTIC’s annual zero a day count for in the wild exploitation went from roughly 98 to 100 in 2023, down to 75 to 78 in 2024 and back up to 90 in 2025. So the headline is a rebound, not a surge. We’re still in the same 60 to 100 band that’s held since 2021. Tova Dvorin (07:00.181) I still think it’s important for us though to understand where change actually happened, even if we’re roughly within the same band. What happened here? Adrian (07:07.665) absolutely Tova the change was shape not volume enterprise targeting hit an all-time high 43 of the 90 zero days so 48 % were aimed at enterprise products particularly edge devices and identity infrastructure browser zero days collapsed to historical lows operating system and edge appliance zero days went up so the attacker’s center of gravity moved from the browser to your VPN concentrator your SAP NetWeaver server, your SharePoint farm, your Avanti Connect secure box. That’s where the action is. Tova Dvorin (07:42.891) And more than that, we saw that attribution flipped. I think that’s where the gold is in the headline. Adrian (07:47.9) That’s the one for the first time in GTIC’s tracking commercial surveillance vendors out attributed nation states. 18 of the 90 attributed to CSVs, Intellexa being the named example versus 15 attributed to state sponsored espionage. The Intellexa leak in December showed that at least 15 zero days burned since 2021 to keep Predator spyware running. Tova Dvorin (08:16.599) So if I’m picking up what you’re putting down here, a large chunk of the rebound actually has nothing to do with AI. Adrian (08:21.527) Exactly, is mercenary spyware vendors adapting their exploit chains to new mobile and OS mitigations. It’s not AI suddenly made vulnerability discovery cheap for everyone. Worth noting also is that observation bias plays a role. Apple’s lockdown mode forensics, CitizenLab and the CityTech have all matured. We’re seeing more CSV activity partly because we got better at instrumenting their victims. Tova Dvorin (08:47.679) Okay, so back to the topic of this episode though, where do AI, where does Mythos, where does the new chat GPT five point five model actually show up in the larger picture of zero days? Adrian (08:58.385) So two first landed in the window and they’re both important. On the defender side, Google’s Big Sleep agent, which is a project zero plus deep mind collaboration preempted CVE 2025, 6965 in SQLite in July last year. GTIG had artifacts suggesting threat actors were stating that exploit but couldn’t pin it to a CVE. Big Sleep, given those leads, identified the underlying bug. It got patched before exploitation. Google called it the first time an AI agent has been used to directly foil an in the wild exploitation attempt. Tova Dvorin (09:37.761) And what about the attacker side? Adrian (09:39.858) So that’s the GTIG report from the 11th of May. They observed the first AI developed zero day exploit in the wild, a Python script bypassing two factor authentication in a widely deployed open source web admin tool. T1556 Modify Authentication Process, if you want the technique. They didn’t name the tool or the actor, but they say it was being staged for a mass exploitation event and they disrupted it. Tova Dvorin (10:05.665) You know, it’s always easy, at least for those of us who work in marketing, to spot when something’s been written with AI, when something’s been created image wise with AI. When it comes to these attack detection vectors, how is this spotted as being AI authored? Adrian (10:19.131) This is the part I loved over, the script had hallucinated CVSS score in the comments. Didactic Python docstrings written in textbook style. An anti-colour helper class with the kind of comprehensive help menu a human exploit author would never bother with. The AI style became its fingerprint. Tova Dvorin (10:37.703) it’s ironic. The thing that was supposed to make these exploits invisible was the same thing that gives them away. Adrian (10:43.267) It cuts very hard against the AI exploits for undetectable framing. The opposite is true so far. They’re actually putting a target on their own backs. There’s also the aisle on HackerOne, who reportedly attributed all 12 zero days in the January OpenSSL security release to their AI system. That’s a community source number, primary OpenSSL advisory text, and HackerOne disclosure timestamps would tighten it. But if it stands, it’s a real data point. Tova Dvorin (11:11.213) So if those are firsts and the big s and Big Sleep has disclosed about twenty open source zero days across twenty twenty five, plus a squalite case, where does the volume sit? How many CVEs are publicly AI accredited? Adrian (11:24.081) This is the single best counterhype number in the corpus. Barracuda runs something they call the Methos Hype Index. Yeah, that’s what they’ve named it. Their May 12 tracker reports fewer than 200 CVEs publicly credit AI or Large Language Model Assisted Discovery against a 2025 CVE population of roughly 46,000 published staggering numbers. So less than half a percent of disclosed CVEs publicly say AI helped find them. Tova Dvorin (11:55.916) Right, but remember that that’s the publicly credited number. Researchers who use LLMs as a code review aid don’t have to declare this. Adrian (12:03.587) One second, I can’t get my PDF to change page. one check Adrian (12:12.471) it this way that’s better and scroll down now one second pause for breath Adrian (12:30.289) Right, there’s almost certainly more uncredited assistance, but the headline still stands. The AI discovery story is real, but small relative to the CVE population. And by the way, the CVE population itself is being reshaped by something that has nothing to do with artificial intelligence. CVE volume hit roughly 46 to 48,000 in 2025, a record of 16 to 20 % year on year. But that growth… is structurally driven by CNA expansion. There are now 484 CVE numbering authorities. WordPress plugin-focused CNAs alone, PatchStack and Wordfence, accounted for 44 % of the 2024 CVEs. Tova Dvorin (13:13.867) Wow, what about MVD? Adrian (13:16.049) NVD effectively collapsed as a canonical enrichment service and this formally narrowed scope on the 15th of April this year. They’ll prioritise CVEs in CESA KEV, federally used software and EO14028 critical software. Everything else goes into a triage queue. The NVD enriched 28 % of the 2025 published CVEs versus 46 % in 2024. So the denominator we use to compute AI’s of vulnerability discovery is shifting under our fees. Tova Dvorin (13:51.053) Let’s move to exploit timelines because this is where I think a lot of listeners have heard the most alarming numbers. Mendiant’s time to exploit. Adrian (14:00.37) One second, I went to go down the page and it went… I have to cut me out. Tova Dvorin (14:24.215) By the way, I think we’re going to end up with like a thirty minute episode if we keep going. We’re already at fourteen. Adrian (14:28.955) BOOM. Adrian (14:38.609) Okay, I’m on pad. And this will use the cursor, the slidey thing and not… If I hit the down button, it goes to a new page. My pad. The number is real and it deserves a careful framing. Mtrends 2026, published in March, reports that the mean time to exploit across Mandiant’s observed exploited CVE population in 2025 was minus seven days. Tova Dvorin (14:46.284) okay. Tova Dvorin (15:05.591) Wait, negative seven days? I’m reading that out loud. It sounds how is that po how is that even possible? It just doesn’t sound right. Adrian (15:12.465) It doesn’t sound right does it? Across the same metric the ARC was 63 days back in 2018-19, 44 days in 2021, 32 days in 2021-22, 5 days in 2023, 5 days in 2024 and now negative. Exploitation proceeds patch on average for the CVE’s Mandiant Saw Weaponised last year. Examples on the book SharePoint Tillshell CVE 2025 53770 Citrix Bleed 2, CVE-25-5777, Avanti Connect Secure, CVE-25-0282. These are the population. Tova Dvorin (15:50.615) What’s the caveat here? Adrian (15:55.378) The metric definition shifted. The 2018 to 2022 numbers were medians of patch to exploit days. The 2023 onwards numbers are means over the observed exploited population. So the long arm from 63 days to negative seven is directional, not statistically applies to, sorry, apples to apples. Tova Dvorin (16:18.123) And moreover, the population here is selection biased as well. Adrian (16:21.361) It is biased towards incidents where Mandiant was retained for IR. Big game ransomware, advanced persistent threats, mass exploitation events. That said, multiple independent telemetries agree on the direction. Rapid7’s 2026 report puts mean time to exploit at 28 and a half days, down from 61. Median CVE to KEV listing dropped from 8.5 days to 5 and confirmed exploitation of the new CVSS 7-10 vulnerabilities went up 105 % year on year. 71 in 2024, 146 in 2025. Tova Dvorin (16:59.009) Wow, and we got even more specific with the Volnchek’s figure for same day exploitation. Adrian (17:04.561) 29 % of 2025 KEVs were exploited on or before the day the CVE was published. That’s up from 23.6 % in 2024. And Verizon’s DBIR 2025 for Edge device and VPN CVEs specifically found the medium time between vulnerability and publication and mass exploitation is zero days. Zero days for zero days. Edge exploitation, exploitation. Tova Dvorin (17:28.557) Yeah. Adrian (17:30.993) edge device exploitation grew eightfold as a share of vulnerability exploit actions year on old, 3 % to 22%. So when people say the patch versus exploit gap has inverted, the data supports that for the worst class of vulnerabilities. Tova Dvorin (17:50.573) So the related mandate number there, that’s the access handoff figure. Adrian (17:54.13) 22 seconds. Medium time from an initial access broker handing off compromised access to the ransomware affiliate who’s going to exploit it. That was ours in 2024. 22 seconds in 2025. The criminal supply chain has industrialized. That number reads as if it should be implausible. It isn’t. It’s just what happens when you put a telegram channel and a payment rail between specialists. Tova Dvorin (18:19.883) Now we just spent a significant amount of time going through all of the data and really doing a deep dive into the full picture. But let’s zoom out a little bit. How much of all of this is actually AI? Adrian (18:29.489) Less than the framing suggests, the cleanest piece of academic evidence for AI driven exploit capabilities, CVE Genie, and that’s an archive paper from September last year. Multi-agent LLM framework that reproduced 51 % of CVEs published between June 2024 and May 2025 with verifiable working exploits at an average cost of $2.77 % CVE. On a pilot of 60 CVEs published after the model’s knowledge cut off, the success rate was 63%. That’s a serious data point. It proves both capability and economics. Tova Dvorin (19:10.165) Right, but proof of capability does not mean causation. It’s not sorry, but proof of pa but proof of capability isn’t proof of causation when it comes to these timeline numbers. Adrian (19:19.395) Right, and here’s the corrective. Cybal Angel published a frank piece on this. They wrote, and I quote, attributing the time to exploit collapse solely to AI oversimplifies a complex phenomenon. Tova Dvorin (19:33.879) So as much as we’re talking about AI and how much AI has changed our world, both in cyber and without, we have several alternative drivers. Let’s let’s start going into what’s driving these changes. That’s not directly related to all of these AI models. Adrian (19:45.916) So broadly there’s three of them. Mass scanning infrastructure, said Matt Masscan, Shodan, Sensys, has been able to scan the entire IPv4 space in under an hour for over a decade. Turnkey POC release norms, publishing a proof of concept alongside. or right after the advisory have been the trend for years. And CESA’s KEV mandate, BOD 2201, is itself a defender visibility compression. Once a CVE goes on the catalog, exploitation gets logged. So a lot of the exploit timeline collapsed story has actually got perfectly good large language model explanations. Tova Dvorin (20:26.069) Okay, thank you for unpacking those. You know, th I think there’s another counter signal we haven’t discussed yet, which is the POC pollution problem. Adrian (20:33.009) Yes, Tova, and it cuts both ways. Grey Noise published a piece called the POC pollution problem last year. They documented AI-generated exploits referencing non-existent endpoints, wrong parameter names, broken-off bypass logic. Vulncheck reports publicly exploits and POCs went up 16.5 % year-on-year and explicitly flagged that a large share of that increase is AI-generated content, much of it non-functional. Only about 1 % of the 2025 CVEs were exploited in the wild by year end despite the POC flood. So we have to be precise. The proof of concept volume looks like an AI surge. The working exploit volume hasn’t moved the same way. Tova Dvorin (21:16.725) Adrian, let’s pivot to the defender side, the patch timeline. Did A I move that curve? Adrian (21:23.749) This is the part of the answer that should worry CSOs most. The defender curve has not meaningfully moved and in places it’s gone the wrong way. EdgeScan’s 10th edition vulnerability stats report, that’s the 2024 data report, meantime to remediate higher critical application vulnerabilities was 54.8 days. For critical only applications, it was 74.3 days. 37 % of the vulnerabilities discovered in a 12 month window in larger enterprise were still unresolved at the year end. Tova Dvorin (21:55.746) And that’s only one report. If you read Vericode’s report, it’s an even starker difference. Adrian (22:00.082) Absolutely. So over Veracode State of Software Security 2026, they titled it drowning in security debt. Mean floor fixed time has risen 47 % since 2020. The floor half-life is 252 days overall, 315 days in public sector. 82 % of organizations now carry security debt. 60 % of those carry critical debt. year on year critical debt is up 20 % Tova Dvorin (22:33.633) So we’ve got two distinct stories here, right? It’s the introduction velocity versus fixed velocity story. Adrian (22:39.121) Exactly, detection is faster than remediation and the gap is widening. The number that should make the rum go quiet though is QALYS. Their T.I.U. team analysed over a billion C.sickev remediation records across 10,000 organisations over four years. The percentage of critical vulnerabilities still open at the seven day mark has worsened from 56 % to 63%. Tova Dvorin (23:05.909) Wait, so you’re saying that the defender population is getting worse at remediating, not better, even though we have all of those new tools at our disposal. How’s that possible? Adrian (23:13.893) Well at the population level that’s absolutely what the figures say. BitCytes analysis of 1.4 million organisations against the KEV catalogue found more than 60 % of KEVs are immediate past CESAs deadline. Federal agencies are about 56 % more likely to meet the deadline than non-federal. But even there, plenty of misses and the GAO and OIG audit reports continue to KVO the due items in HVA systems. ABDIBTIVA’s state of patch management survey is the operational number. 77 % of organizations need at least a week to deploy patches against a CrowdStrike-cited 48-minute e-crime breakout time. Tova Dvorin (24:00.014) So this is the picture. Ultimately, attackers can weaponize in hours, but defenders need a week to deploy. Let’s go through one more round of statistics with the IBM cost of adapt breach report to figure out what the numbers are on AI helping the defender side in practice. Adrian (24:15.025) So IBM’s cost of a data breach 2025 found mean time to identify and contain breaches fell to 241 days and nine year low. Organizations using AI and automation extensively saved $1.9 million per breach and cut breach life cycle by about eight days. Average global breach cost fell to 4.4 million dollars. 9 % year on year drop so AI on the defender side does show up but and this is the bit that matters 63 % of breached organisations have no AI government policies at all so the AI uplift is concentrated in a mature minority while a long tail goes the other way. Tova Dvorin (25:00.609) So this is the adoption asymmetry that we’re talking about. Adrian (25:04.293) The asymmetry is not about capability, Tova. The capability is symmetric. The same model class powers X-Bow’s autonomous pen testing agent and Google’s big sleep defender. The same model class is what I’ll use on OpenSSL and what Anthropic ships in Cloud Code Security. They launched that in beta on the 30th of April with 500 plus previously unknown open source vulnerabilities surfaced. Tova Dvorin (25:30.157) So wait a second, if the capabilities are symmetric, where does the asymmetry lie here? Adrian (25:34.693) Deployment scaffold. There’s an excellent piece on CJE.io from last month that frames it as well as anyone. Offense scales with compute, defense scales with committees. Tova Dvorin (25:48.297) Unpack a little bit of b of sorry, Adrian, can you unpack that for us? Adrian (25:52.582) The attacker’s velocity is gated by compute and creativity. Low cost of failure, the defender’s velocity is gated by enterprise policy, change control, production stability, compliance review, procurement, vendor consolidation, board appetite, insurance posture, and organizational consensus. The Strike 48 CSO survey published two weeks ago captures this exactly. 84 % of CSOs believe AI agents should run a tier one sock work. but only 22 % will actually fully automate even basic level one tasks. A 62 point gap between belief and deployment. Tova Dvorin (26:31.585) Wow, and there’s even a benchmark finding here that backs the structural point. Adrian (26:35.417) Yes, the Cyber Sock Eval. The CrowdStrike Meta Benchmark published last September. Their headline finding is that reasoning model gains do not transfer cleanly to defensive sock reasoning under current training recipes. So even if defender deployment caught up tomorrow, the model class isn’t yet trained to think like a defender in the same way, it’s trained to think like a coder. That’s a real ceiling that the offensive side isn’t constrained by. Tova Dvorin (27:03.467) Okay, we went through a lot of numbers, more than usual. We went through a deep dive in what the landscape looks like in practice and different estimates. If listeners, you’re still with us by now, what should the people listening actually do about this? They’ve got ten ye ten sorry, they’ve got ten months until the end of the financial year twenty twenty six budget. Is that true? That seems a little bit How’s that ten months? Adrian (27:27.473) and it’s not 10 months is it that’s well done we’ve picked up they’ve got six months Tova Dvorin (27:29.197) They’ve got Tova Dvorin (27:32.983) They’ve got six months until the end of the financial year twenty twenty six budget. What’s the operational call here? Adrian (27:39.964) Four moves and they’re not new. They’re the moves you’d make even if AI weren’t in the picture. Just with one more urgency. One, assume time to patch is not your control. The rapid seven phrase is the right one. The obsolescence of time to patch. Stop measuring yourself by patch SLA alone. Measure yourself by exposure window. The gap between vulnerability publication and the moment your compensating controls would catch an exploit attempt. That’s the number that actually correlates with breach outcomes. Tova Dvorin (28:11.147) Okay, what’s your second suggestion? Adrian (28:13.581) Validate your controls continuously. This is the AEV part of the conversation, adversarial exposure validation, which Gartner formalizes a category in its first AEV specific market guide on the 24th of March. AEV is the validation layer, pillar four of the CTEM cycle. You simulate the attack, not against a synthetic target, but against your actual production controls. And you measure whether your network inspection, your DLP or EDR would have caught it. SafeBreach’s own 2026 State of the Breach report aggregated 1.8 million attack simulations executed by customer enterprises in 2025. The block rate baselines were sobering. Network inspection 65%, DLP 70%, N.53%. And against Russian GRO, UTAC ticked specifically a 28 % miss rate. That’s not vendor pitching. That’s empirical control effectiveness baseline of the customer population that has the discipline to measure itself. Tova Dvorin (29:18.901) And what’s your thir sorry, I I don’t like these transitions. da da da Okay, beyond continuous validation, what else can people do? Adrian (29:28.877) How about you say that differently, Toby? and I understand the net, point three is the bug bounty signal to noise problem. Tova Dvorin (29:37.793) And I understand that point three is the bug bounty signal to noise problem. Adrian (29:41.458) Absolutely Tova. Daniel Stenberg, the curl maintainer, ended the curl bug bounty on the 1st of February this year. Over seven years, he paid about $90,000 across 81 findings. The genuine vulnerability share of submissions collapsed from roughly 17 % in early 2025 to one in 20 or one in 30 by late 2025. Linus Torvald said the kernel security mailing list is almost entirely unmanageable. FFMPEG publicly called Google’s big sleep reports CVE slop. If your defender program accepts vulnerability submissions, you need a triage stack designed for AI generated noise. Assume the signal to noise ratio has degraded by a factor of three to five. Tova Dvorin (30:30.903) And that brings us to our final point, which is AI governance. It’s a must have. Adrian (30:35.441) 100 % Tova. Govern your own agentic AI before you deploy more of it. The Kiteworks 2026 forecast report found a 100 % of organizations have agentic AI on the roadmap, but 60 % cannot quickly terminate a misbehaving agent. 63 % cannot enforce purpose binding on agents. 33 % lack adequate audit trails. The defender side AI surge is itself a fresh attack surface. The threat model is no longer just what’s the attacker doing with their LMM, it’s what’s their own agent doing with the credentials we just gave it. Tova Dvorin (31:15.829) Adrian, for those of our listeners who are listening to this at one point five or one point seven side speed, I just got to this part. Can you give me a one-line summary? Adrian (31:24.305) Capability is at the frontier and vendors are dialing it back at training time, which tells you the line is in view. The zero day curve moved in shape, not in volume. And the commercial spyware industry explains more of the rebound than AI does. The exploit timeline is real. Exploitation precedes patch on average now, but mass scanning, turnkey POCs and KEV, Paul, each get credit too. The patch curve hasn’t moved. and the Defender Deployment Velocity Tova Dvorin (32:03.469) And before I move on to the next prompt, I do have to point out that we opened up with all of these AI models at the beginning of the episode and we haven’t really talked about them at all. We’ve been talking about all of the circumstances surrounding it. I don’t know if you want to pause and just g dedicate some time to actually speaking about what Mythos and Chat GPT 5.5 do, or if you just want to move on. Adrian (32:22.51) So that probably should be an episode on its own because that’s not a brief explanation. Tova Dvorin (32:33.235) Right. I’m just worried we’re actually misframing the episode then, ’cause we’re not really talking about them. Tova Dvorin (32:43.201) I can rename this episode and we could just say that mythos and everything. Adrian (32:45.329) think we need to read I’m just skipping through the rest of the Adrian (32:57.125) Yeah, we need to rename, because we can explain what GPT and what Genetive Threat Language models are. But actually giving away the fine point of what Methos Preview does, we’ll be teaching people to hack and we’ve got to tread carefully. Tova Dvorin (33:15.841) Mm-hmm. Adrian (33:16.881) So yeah, all it is we need to rename it. Tova Dvorin (33:19.627) Okay, I’m just putting that on the table now. Don’t worry, I’ll cut this out. All right. All right. And the honest answer to the question we opened with Adrian has AI changed the vulnerability life cycle? Adrian (33:21.785) Yeah, you you’re quite right. Adrian (33:30.223) Yes, at the margin with two clean firsts on the board. Big sleep on the defender side, the major two factor authentication bypass story on the attacker side. And no, not yet in the way the headlines suggest. The labs with the strongest commercial incentive to declare a capability movement, Anthropics RSP version three, OpenAI’s preparedness framework version two, explicitly do not declare cyber at threshold. That’s the single biggest evidence point against AI changes everything. Tova Dvorin (34:01.065) Adrian, as always, thank you so much for doing all of this deep research on the AI attack landscape and the AI defense landscape today. And listeners, sources for everything Adrian referenced today, including the GTIG Zero Day Review, the Mandate Mtrends 2026 Report, Rapid Seven’s Global Threat Landscape, the AISI Evaluations, the Barracuda Methos Hype Index, Cybersaki Val and the Curl, FFMPEG and Linux maintainer posts. That’s a lot, but they’re all in the show notes. We’ll be back next week. With the th no, we already did that. We will be back at a later time with more on what these models that we mentioned in the opening of the episode actually do, without giving away too much for those of you who are curious, who want to separate between hype and fact. But until then, and until next time, stay safe, stay safe with Safe Breach. Adrian (34:52.978) There’s a lot of-

In This Episode

Your zero trust architecture doesn’t cover your AI agents—and attackers already know it.

Are frontier AI models actually flooding the market with zero days—or is that just hype?

Host Tova Dvorin and SafeBreach offensive engineer Adrian Culley cut through the noise with mid-2026 data from GTIG, Mandiant M-Trends, Rapid7, and AISI to answer the question every CISO is wrestling with right now.

AI reshaped the zero-day curve in shape, not volume—here’s what that means
The two verified AI “firsts”: Google’s Big Sleep discovery and a novel 2FA-bypass exploit
Why commercial spyware—not AI—is the dominant driver behind the zero-day rebound
What a negative-seven-day time-to-exploit figure tells us about attacker speed
Why defender deployment lag remains the real bottleneck—and what to do about it

Essential listening for CISOs, red teams, and security leaders navigating the AI era threat intelligence.

Subscribe on Your Preferred Platform

White Paper

How SafeBreach Propagate Fills Critical Cybersecurity Needs for Enterprise & Mid-Sized Security Teams

Use Cases

Case Study

Fortune 500 Energy Provider Enhanced Threat Management & Business Resilience with SafeBreach

Resources

SOLUTION BRIEF

SafeBreach CTEM Platform

Resource

SafeBreach & Zscaler Internet Access™

White Paper

Four Pillars of Breach and Attack Simulation

Podcast: The Mythos Hype Index: What AI Really Did to the Zero-Day Curve

In This Episode

Subscribe on Your Preferred Platform

Spotify

Apple Podcasts

YouTube

You Might Also Be Interested In

The Adversary Got an AI Upgrade—Is Your Security Program Ready to Compete?

Podcast: Blind With Scissors: The NSA’s MCP Warning for Every Agentic AI Deployment

AI & the Vulnerability Lifecycle Infographic