Mythos & ChatGPT 5.5: Zero Days Exposed

Tova Dvorin (00:01.364) Welcome back to the Cyber Resilience Brief, a Safe Breach podcast. I’m Tova Dvorin your host, joined as ever by our favorite offensive cybersecurity engineer expert, Adrian Culley. Adrian, this is one I’ve been wanting to record for several weeks now. The argument that I want to put on the table today is that the offensive cyber landscape changed in April. Not gradually, but in a step. Now there are two Frontier Model releases, Anthropic’s Claude Mythos Preview and OpenAI’s ChatGPT 5.5, and they’ve moved the line of what is possible with a keyboard. And every CISO listening to this needs to internalize that that line has moved. Adrian (00:36.632) has moved and it’s moved further than most boards have been told, Tova. The thing I’d flag at the top of this isn’t a marketing claim from Eitherlab. The UK AI Security Institute, AISI, independently evaluated both models in April and May and their published findings frame both as genuine step changes over what was deployable six months ago. When the regulators are saying it before the vendors say it, that’s the signal. Tova Dvorin (01:06.794) So let’s walk listeners through what these systems can actually do, capability first, not statistics first. And then let’s get to the part that the show exists for, which is of course what it means for you, the defenders, what it means for the threat actors, and then what it means for the security industry as a whole. So Adrian, start me with a capability. What does Mythos do that the model before it could not do? Adrian (01:28.142) Three things, I’ll keep them concrete. One, it finds bugs that have been sitting in production code for decades. Mythos identified a 17-year-old remote code execution flaw in FreeBSD’s NFS implementation. It found a 27-year-old integer overflow in OpenBSD. It found a 16-year-old vulnerability in FFmpeg that had survived more than five million automated fuzzing runs. Tova Dvorin (01:53.44) Whoa, these are bugs that human researchers have been over with fuzzers for years, decades even. Adrian (01:59.808) Repeatedly, Tova, and two it weaponizes them the firefox 147 javascript engine benchmark is the one most people are quoting Anthropic published that Claude Opus 4.6 the previous frontier model produced working exploits twice in several hundred attempts Mythos produced 181 from 2 to 181 same target, same conditions, that isn’t an increment, it’s an exponential different category of system. Tova Dvorin (02:32.49) Wow, Adrian. Now you said that there would be three categories. Sorry. Now you’d say that now you say that there would be three things that Mythos would do. I don’t think we covered the third yet. Adrian (02:42.285) Chaining. Mythos was the first model to complete what Anthropic calls the last ones. A 32 step corporate network attack simulation that goes from external reconnaissance to full domain dominance, estimated at 20 hours of expert human work, it finished it satisfactorily end to end. Tova Dvorin (03:02.932) And the AISI evaluation backs those numbers with their own testing. Adrian (03:06.893) It did. AISI ran it on the Sybench Suite. That’s the standard capture the flag battery. And Mythos got a 200 % pass across all 35 challenges. First attempt. On the Expertia task, the ones no model could complete before April 2025, AISI clocked it at 73%. Then GPT-55 dropped a few weeks later and AISI rerun the same battery. GPT-55 hits 71.4 % on the expert tier. Mythos was 58.6 % on the AISI’s retest. So OpenAI’s model is at parity. We’re not talking about one outlier. We’re talking about the frontier moving in lockstep across two independent labs. Tova Dvorin (03:59.946) You know, Adrian, that parity point matters from a strategic perspective, because the natural defenders hope is well, Anthropic withheld Mythos from public release. They shipped it into Project Glasswing with twelve founding partners of forty critical infrastructure organizations with structured access only, so we don’t need to worry just yet. But that hope died the moment that Ch Chat GPT 5.5 went generally available. Adrian (04:22.04) Correct, and there’s a sharper version of that point. OpenAI shipped GPT 5.5 with a purpose-built offensive variant called GPT 5.5 Cyber, available through a trust access program to vetted researchers. So you have two distinct postures from the two labs. Anthropic frontier capability exists. We’re not deploying it broadly. We’re using it through Project Glasswing to fix critical software before others find the bugs. OpenAI Frontier capability exists, we’re classifying it as high under our preparedness framework, we’re shipping it with safeguards and a permissive variant for legitimate defenders. Different philosophies, same underlying fact, which is that the frontier has crossed the threshold both companies feel obliged to govern. Tova Dvorin (05:11.763) An AISI’s broader framing, and this is the part that should be on every CISO slide deck this quarter. They have published that autonomous AI cyber capability is doubling every four point seven months. Adrian (05:27.234) You just froze that over for some reason. yeah, sorry, my end froze. Apologies. Tova Dvorin (05:29.416) No, I’m here. Tova Dvorin (05:35.465) Okay. Adrian (05:39.279) So you finished on 4.7 months, yeah? Super. Every 4.7 months, that is the metric to internalize. The cadence at which the offensive capability ceiling is rising is faster than the cadence at which most organizations re-architect their security posture. Most enterprises run a three-year security roadmap. The capability frontier doubles seven and a half times inside that roadmap window. Tova Dvorin (05:40.615) Yeah. Tova Dvorin (06:07.389) Let’s talk about what’s already happening downstream of this, because that industry response from CISA, NCSC, ASD, the Singapore Cybersecurity Agency has been unusually fast and also unusually aligned. Adrian, walk me through the regulatory picture. Adrian (06:22.136) So the Australian Signals Directorate, ASD, updated its Frontier AI Guidance on the 30th of April, 26, naming both MIFOS and GPT 5.5 explicitly. On the same day, APRA, the Australian Banking Regulator, wrote to the major Australian banks telling them they were behind on AI security. And I think they were brave to put their heads up Tova. think that’s reflected around the world. Tova Dvorin (06:46.43) And what about Tova Dvorin (06:50.939) Absolutely. I mean let’s talk about the other Five Eyes regulators. Adrian (06:54.498) So Singapore’s CSA issued advisory AD 2026.004 on frontier model risk. The UK’s NCSC has published joint guidance with AISI on how defenders should prepare for what they’re calling the vulnerability patch wave. The period when frontier model discovered bugs start landing in advisories, faster than patch programs can absorb them. And CESA has flagged AI augmented offensive capability in its threat advisory cycle. with the specific note that both nation-state and criminal actors are already integrating these models into their tool chains. Tova Dvorin (07:31.551) And that last point is the one I want to dwell on, because The argument from twelve months ago is that frontier models were a future problem, that threat actors would lag the labs by a year or two before adoption. That assumption broke this spring. Adrian (07:43.215) It broke in March 26th Tova. The campaign the industry is calling Cyberstrike AI hit 648 firewalls across 55 countries in March of this year. Fully automated credential harvesting, network reconnaissance and lateral movement. The forensic write-ups are clear that no single human operator was driving it. An AI agent was orchestrating the campaign across the fleet. That is a working AI-led offensive operation in the wild, not a lab demo. And in February, a separate finding, a coding agent doing what its operator had asked it to do, hit an authentication barrier while trying to stop a web server. Rather, Nascar independently found a alternate path to root and took it. Emergent offensive behavior from a model that wasn’t deployed for offense. This is qualitatively new. Tova Dvorin (08:40.669) mean, if you think about it, it’s also nightmare fuel. So the proposition that organizations could time their response, adopt frontier aware controls and threat actually showed up, that’s that’s done. That’s already been overtaken. Adrian (08:51.424) it has and the harder point which I think we owe the audience honestly is that the techniques Mythos demonstrates can already be partially reproduced with cheaper open weight models. ASI and others have flagged this the FrontierLab capability becomes the open source capability on roughly a 6 to 12 month lag. The cost per attempt in NCSC’s reference simulations was around £65. That’s not nation state economics, that’s mid-market criminal economics, which means mid-market targets, manufacturing, healthcare, regional banks, are now in scope for techniques that two years ago required APT-grade resourcing. Tova Dvorin (09:32.477) Now this is part of the conversation where on a different show we’d be leaning into the panic, but I want us to do the opposite here, Adrian. Because there is a defender’s case to be made here, and the labs themselves, that means Anthropic with Glasswing, OpenAI with GPT 5.5 cyber’s trust program, are clearly trying to use the same capability to harden the ecosystem. So Adrian, what’s the realistic defender position in this case? Adrian (09:54.991) The defender position is that the assumption set has to shift. The old assumption was that the vulnerability discovery was an expensive scarce activity, so the patch cycle could run at the cadence it’s run at for 20 years. The new assumption is that vulnerability discovery is becoming cheap and abundant. If you can find a 27 year old bug in OpenBSD with a single model run, the volume of vulnerabilities entering the advisory pipeline is going to climb sharply. CISOs need to plan for a patch backlog that doesn’t compress, it’s actually growing. That changes the architecture. You stop trying to patch your way to safety and you start validating continuously whether your existing controls actually stop what’s now in the wild. Tova Dvorin (10:39.839) Which of course is the proactive security or CTEM argument the industry has been making for two years. It’s just become non negotiable. Adrian (10:47.808) Non-negotiable is right. Continuous threat exposure management, Gartner’s framing, has an explicit validation layer. Adversarial exposure validation. The premise is that you assume you cannot patch everything, you assume you cannot detect everything, and you continuously prove which of your defenses actually work against current attacker techniques. The premise was always defensible. In a frontier AI world, it’s the only premise that holds. SafeBreach is the leader in that validation layer and the customers we work with who are coping best with this shift are the ones who have already operationalised continuous validation as a posture, not as a quarterly exercise. The principle matters Tova more than the tool. What matters is that someone in the organisation is actually doing the validation work against the techniques that are landing in the wild this quarter, not last year’s. Tova Dvorin (11:44.255) Let’s spend a couple of minutes on the threat actor angle specifically, because the question that I get from CISOs is which actors are using this and against whom? And then of course, what can I do? What’s the honest answer? Adrian (11:56.131) The honest answer is that the early evidence points to all four crink states experimenting, China, Russia, Iran, North Korea, and to financially motivated criminal crews also moving faster than the state actors. because they have a tighter feedback loop. The Mandiant and Google threat intelligence write-ups from this spring are clear that AI assisted vulnerability identification and exploit generation are showing up in real intrusions. The targeting pattern is what you’d expect. Initial access brokers and ransomware affiliates are getting more productive. More campaigns are running concurrently. Less time between disclosed CVE and weaponized exploit. The window from advisory to active exploitation has compressed measurably in the last 12 months. Tova Dvorin (12:40.083) And of course for defenders, the problem with that compression is that patch programs were already struggling as it was. Adrian (12:46.114) They were The average enterprise patches critical vulnerabilities in something like 15 to 30 days. The window from public disclosure to exploitation on the most watch advisories is nowadays sometimes hours. The patch wave language NCSC is using isn’t dramatic, it’s diagnostic. Defenders are going to face waves of high severity advisories arriving faster than patching teams can absorb. The only viable response is to architect for assumed compromise. of unpatched paths and validate that compensating controls are actually firing. Tova Dvorin (13:22.879) So I want to pull on the industry implications thread because this matters beyond one single CISO. The security vendor landscape has just had its assumption set rewritten as well. If Frontier AI can autonomously find zero days in major operating systems, what does that do to the vulnerability research economy, or to bug bounties, or to the offensive security services industry? Adrian (13:43.727) Three things, and these are predictions worth marking. Number one, bug bounties for low-hanging vulnerability classes are going to collapse in economic value because labs and well-resourced researchers will run frontier models against the targets before human bounty hunters get to them. Project Glasswing is essentially a structured preemption of that. Number two, offensive human offensive security industry shifts up market. Penetration testing as it has existed, point in time manual scope to a quarter was already losing relevance to continuous validation a long time ago. Frontier AI compresses that further. The penetration tester of 2027 is operating frontier models against custom targets, not running the burp suite against the web app. Number three, and this is the one I’d watch most closely, and I am watching most closely, the symmetry between attackers and defenders, which has favored attackers for 30 years, is being compressed, but bilaterally in both directions. Frontier AI is a multiplier on both sides. The question is which side operationalizes it first inside their cycle. And right now, the criminal actors are operat- I can’t put my teeth back in. Tova Dvorin (15:01.627) Yeah. Adrian (15:02.978) The criminal actors are operationalizing faster than most enterprise security teams. Tova Dvorin (15:09.065) What about the labs themselves? Is the lab posture that’s withholding structured access and trust programs going to hold up over time? Adrian (15:17.57) think Anthropic’s posture is genuinely defensible and Glasswing is a serious effort. 40 critical infrastructure organisations to start with, 12 founding industry partners, AWS, Microsoft, Google, Cisco, Palo Alto Networks, CrowdStrike, JP Morgan Chase, Linux Foundation, Nvidia, Apple, Broadcom, Anthropic itself. being given monitored access to Mythos to find and fix vulnerabilities in their own foundational software before officers do. And that’s just been expanded in the last 48 hours to a further 150 businesses. This is meaningful intervention. OpenAI’s Trust Access Program on GPT-55-Cyber is the same shape, different mechanism. Both will hold for some months. They will not hold indefinitely though, because as the AISI doubling cadence implies, the capability gap between frontier lab and the next tier of models closes quickly. The harder question, which I don’t think anyone has the answer to, is what the posture looks like when the third or the fourth model with this capability lands and one of them isn’t from a western lab. Tova Dvorin (16:27.551) You know, this brings us to the forward look. Let’s let’s look twelve months out. What should every listener be watching for? Adrian (16:34.242) Five things this time. One, watch the AISI publication cycle. AISI has become the de facto authority voice on Frontier Offensive Capability and their evaluations are the closest thing industry has to ground truth. Number two, watch for the first publicly attributed nation state campaign that names frontier model use in the indictment. It’s coming. The forensic capability to attribute AI orchestrated operations exists. The political readiness to name it lags slightly. Tova Dvorin (17:06.227) Hmm. What about technical floor indicators? Adrian (17:10.422) So that takes us to number three Tova. Watch the open weight reproduction curve. When a 30 billion parameter open model reproduces Mythos class vulnerability discovery on a class of targets, the floor drops sharply. Number four, watch the patch wave indicators. Vulnerability disclosure volume, time to exploitation, adversary cadence from CESA and the Five Eyes. If the patch is real, it will show up in those numbers inside two quarters. And number five, watch the regulatory response. APRA is the leading edge. Expect equivalents from European banking regulators, the SEC, NYDFS and the FCA to follow within months. The we’re doing AI governance tick box compliance era has ended. It’s in its death rose at the moment. The prove your controls work against current AI augmented techniques era is beginning. and believe we’ve got news and announcements over the next few weeks on that. Tova Dvorin (18:12.329) That’s right, absolutely. Stay tuned for that. Quick pause, by the way. The next answer to our question is also a list. Do we want to change that? Because otherwise it might get a little bit repeat. Adrian (18:23.628) Yeah, we’ve done a lot of that. Let me quickly read it in the quick pause. I’ll read it without doing it a list as a list. Tova Dvorin (18:31.655) Okay, by the way, that OWASP launch was pushed out. I don’t know if I told you. It’s fine, but just Adrian (18:38.168) We’ve done our bit, we’ve done our bit. We’re back to the leadership question. Tova Dvorin (18:41.693) Yeah, I know. Okay. So what is our message to our listeners, CISOs, SecOps leads, security architects or just cybersecurity enthusiasts like us, who are driving home tonight thinking about what their Monday morning will look like? Adrian (18:55.0) So these are things that are urgent Tova them. Assume your attack surface is being discovered by something faster than your asset management catalog. Get continuous attack surface validation in place. Also, assume the patch program will not save you. Build the compensating control validation layer. Adversarial exposure validation. So you know what fires when patching cannot. Also, read the AISI and the NCSC publications yourself. Take responsibility. Don’t take this from a podcast. Don’t believe us. Don’t believe Adrian Otova. Take it from the regulators who are seeing the evaluations and writing the doctrine. The doctrine is being written this year. CISOs who read it now will be six months ahead of CISOs who read it next year. Tova Dvorin (19:43.519) The line I keep coming back to is the one from AISI’s published note autonomous AI cyber capability is doubling every four point seven months. If your security strategy was written before that line existed, it needs a reread this quarter. Hell, it reads an it needs a reread this week. Adrian (20:00.462) And the harder line behind that Tova, the labs publishing these numbers are doing so because they want the defender community to be ready. The threat actors are not publishing, they’re deploying. The lag between the published number and the deployed reality is the lag that CSOs must close. Tova Dvorin (20:20.049) Adrian is Avri, thank you so much for being on this podcast with us and giving us all of this cybersecurity and AI background with your deep knowledge. Now, listeners, that’s the mythos and ChatGPT 5.5 picture as it stands at the start of June 2026 when we’re recording this. We’re sure that there’s going to be more development soon, so when the next checkpoint lands, listen for our episodes. Until then, validate continuously, read the regulators, and assume the frontier has already moved. Until next time, stay safe. Stay safe with Safebreach. Adrian (20:48.367) Thank you Tove. I think that’s a really nice episode.

In This Episode

In this episode of the Cyber Resilience Brief, we discuss how the offensive cyber landscape has dramatically shifted with the release of Anthropic’s Claude Mythos and OpenAI’s ChatGPT 5.5. Every CISO must understand the implications of these advancements on cybersecurity strategies.

Key takeaways:

• The offensive capabilities of AI have doubled every 4.7 months, changing the game for defenders.
• Mythos can find long-standing vulnerabilities that human researchers have missed for decades.
• The regulatory landscape is rapidly evolving, with agencies like AISI and NCSC issuing urgent guidance for organizations.

Subscribe on Your Preferred Platform

White Paper

How SafeBreach Propagate Fills Critical Cybersecurity Needs for Enterprise & Mid-Sized Security Teams

Use Cases

Case Study

Fortune 500 Energy Provider Enhanced Threat Management & Business Resilience with SafeBreach

Resources

SOLUTION BRIEF

SafeBreach CTEM Platform

Resource

SafeBreach & Zscaler Internet Access™

White Paper

Four Pillars of Breach and Attack Simulation

Podcast: Mythos and ChatGPT 5.5—Why AI Now Finds Decades-Old Zero Days

In This Episode

Subscribe on Your Preferred Platform

Spotify

Apple Podcasts

YouTube

You Might Also Be Interested In

Podcast: Zero Trust Breaks Against MCP: Why “Verified” No Longer Means Safe

Podcast: Blind With Scissors: The NSA’s MCP Warning for Every Agentic AI Deployment

Gemini’s Secret Affair: Exploiting Gemini Voice Assistant Through Instant Messaging Apps