June 30, 202614 min read

GLM-5.2 and Mythos: What This AI Cybersecurity Shift Means for Security Teams in 2026

The cybersecurity world moved quickly in June 2026. A Chinese open-weight AI model, freely available to anyone with an internet connection, posted benchmark results that placed it alongside one of the most tightly restricted AI systems in the world. No vetting process. No access approval. No government clearance.

Nishith Rajyaguru

Nishith Rajyaguru

Author
GLM-5.2 and Mythos: What This AI Cybersecurity Shift Means for Security Teams in 2026

That model is GLM-5.2, developed by Zhipu AI. The system it matched on real cybersecurity tasks is Claude Mythos, Anthropic's frontier AI that the U.S. government deemed too sensitive for public release. Understanding what both models represent, how the benchmarks compare, and what the gap between them means for defenders and attackers alike is now essential for any security professional or technology leader.


1. What Is GLM-5.2 and Why Does It Matter?

GLM-5.2 is the latest release in the General Language Model series developed by Zhipu AI, a Beijing-based artificial intelligence company. It became available to GLM Coding Plan members on June 13, 2026, with open weights published three days later on June 16 under an MIT open-source license.

The MIT license is significant. It means anyone, in any country, can download the model, run it on local hardware, modify its parameters, remove its safety filters, and fine-tune it for specific tasks without any involvement from the original developer. There are no geographic restrictions, no usage agreements requiring compliance monitoring, and no dependency on a commercial API.

Three characteristics make GLM-5.2 particularly relevant for the cybersecurity industry.

First, it is truly open-weight. Unlike proprietary models that require API access through a provider, GLM-5.2's parameters are public. A security team can run it inside a fully air-gapped environment with no external network calls. For organizations handling sensitive infrastructure or classified code, this matters enormously.

Second, independent benchmarks indicate its agentic capabilities are comparable to Claude Opus 4.8 and OpenAI GPT-5.5 on certain tasks, at roughly half the operating cost. This is not a marketing claim from Zhipu AI. It comes from security firms that tested it against the same evaluation sets they use for closed frontier models.

Third, because it runs entirely locally, no provider has visibility into how it is being used. This cuts both ways. For defenders, it means no external data exposure. For attackers, it means no API log, no suspicious activity flag, and no account termination.

Cybersecurity researchers at Graphistry noted a concern that GLM-5.2 may have been trained on outputs from GPT-5.5 and Opus 4.8 without authorization, a practice known as distillation. Zhipu AI did not address the claim publicly. Regardless of how the model achieved its performance, the benchmark results have been independently verified.


2. How GLM-5.2 Performs on Real Cybersecurity Benchmarks

14-glm-5.2-perfomance)

Benchmark results for AI models can be selectively framed, so it matters who is doing the testing and what methodology they are using. In the case of GLM-5.2, the evaluations came from two well-regarded independent firms: Semgrep and Graphistry.

2.1 Semgrep's IDOR Benchmark Results

Semgrep is a security tooling company that regularly evaluates AI models against real vulnerability detection tasks. Their benchmark of choice for this comparison was IDOR detection.

IDOR stands for Insecure Direct Object Reference. It is an access control vulnerability in which a web application exposes an internal object identifier, such as a user ID, a file path, or a database key, without verifying whether the requesting user has permission to access it. IDOR vulnerabilities are among the most frequently exploited in production web applications because they are subtle. Detecting one requires recognizing that an authorization check is absent, not simply flagging a dangerous function call. This makes IDOR a meaningful test of reasoning ability rather than pattern recognition.

On Semgrep's IDOR benchmark, GLM-5.2 scored 39% F1. Claude Code, which runs on Claude Opus 4.8, scored 32% on the same dataset using the same prompt. The cost difference was also notable: GLM-5.2 found vulnerabilities at roughly $0.17 each, a fraction of what frontier API pricing typically delivers.

Semgrep published these results transparently and included an important caveat. Their own purpose-built multimodal pipeline, which incorporates guided navigation, endpoint discovery, and additional scaffolding, scored between 53% and 61% F1 on the same benchmark. That is significantly higher than GLM-5.2. The finding is therefore not that GLM-5.2 is the best tool available for IDOR detection. The finding is that a freely downloadable model with no safety guardrails, given nothing but a prompt, outperformed a leading closed frontier model on a real-world security task.

2.2 Graphistry's Blue Team CTF Evaluation

Graphistry runs BotsBench, a continuous evaluation platform for AI models on agentic cybersecurity investigation tasks. They tested GLM-5.2 on the CyBT-CTF and Splunk Botsv3 capture-the-flag benchmarks, which simulate the kind of multi-step investigation work that blue team analysts perform in real environments.

GLM-5.2 achieved a solve rate of 28 out of 59. This placed it at the top of all open-weight models tested and, as Graphistry noted, made it comparable to proprietary systems on the same benchmarks. Graphistry called it the first open-weight model they would recommend for a frontier-like cybersecurity experience.

That description carries weight because Graphistry applies it with discipline. Earlier open-weight models were not close enough to proprietary performance to earn that recommendation. GLM-5.2 changed that.


3. What Is Claude Mythos and Why Is It the Industry Benchmark?

mythos-5)

Claude Mythos Preview is Anthropic's most advanced frontier AI model. It was announced publicly on April 7, 2026, alongside the launch of Project Glasswing, a controlled defensive cybersecurity initiative.

3.1 What Mythos Can Do

During internal testing before its announcement, Mythos identified thousands of previously unknown zero-day vulnerabilities across every major operating system and web browser. Zero-day vulnerabilities are flaws that are unknown to the software's developers at the time of discovery, making them particularly dangerous because no patch exists until the flaw is disclosed and addressed.

Mythos found bugs that had survived decades of expert human security review. In over 83% of cases, it developed working exploits on the first attempt. On the CyberGym benchmark, which tests an AI's ability to reproduce known vulnerabilities, Mythos scored 83.1%. The previous generation Claude model scored 66.6% and had a near-zero rate of autonomous exploit development. In one documented Firefox evaluation, the earlier model produced working exploits twice across several hundred attempts. Mythos produced working exploits 181 times in the same test.

These numbers represent a qualitative shift in what AI can do independently in a security context. Mythos is not augmenting human researchers. In many cases, it is completing the full cycle of vulnerability discovery and exploit development without human direction.

3.2 Project Glasswing and Controlled Access

Because of these capabilities, Anthropic chose not to release Mythos publicly. Instead, it launched Project Glasswing, a coalition of approximately 200 organizations given controlled access to Mythos for defensive security work. Launch partners include AWS, Apple, Microsoft, Google, CrowdStrike, NVIDIA, Palo Alto Networks, and the Linux Foundation. Anthropic committed up to $100 million in usage credits to support the program, alongside $4 million in direct donations to open-source security organizations.

Project Glasswing partners have collectively found more than 10,000 high or critical severity security flaws using Mythos. The program has since expanded to include organizations across more than 15 countries, covering sectors such as power, water, healthcare, and telecommunications.


4. The U.S. Government Decision on Mythos Access

The U.S. government's handling of Mythos access in June 2026 reflects how seriously advanced AI cybersecurity capabilities are being treated at a national security level.

On June 12, 2026, the government issued an export control order that led Anthropic to disable both Mythos 5 and Fable 5 for all users. The concern was that models with these capabilities could be misused by intelligence actors in adversarial nations. The action was sweeping: it applied to all users, including the organizations that had been using Mythos for defensive purposes.

On June 27, 2026, the government partially reversed course. More than 100 companies and institutions, primarily Project Glasswing members, had their access to Mythos 5 restored. Anthropic confirmed that the government had cleared Mythos 5 for redeployment to organizations that operate and defend critical infrastructure in the United States.

The partial reversal left an important gap unresolved. GLM-5.2, which had demonstrated comparable benchmark performance on specific security tasks, remained freely available to anyone in the world. The export controls that restricted Mythos had no effect on an open-weight model hosted on Hugging Face and distributed under a permissive license. This is precisely the tension the independent benchmark results made visible.


5. What GLM-5.2 and Mythos Mean for Security Teams and Developers

5.1 Opportunities for Defenders

GLM-5.2's open-weight nature creates real options for organizations that previously had limited access to AI-powered security tooling.

Security teams in air-gapped environments, classified settings, or organizations with data residency requirements can now run frontier-adjacent vulnerability scanning entirely on local infrastructure. The model does not need to call home, which eliminates a category of data exposure risk that comes with commercial API-based tools.

The cost profile is also meaningful. At approximately $0.17 per IDOR vulnerability found, GLM-5.2 makes high-volume scanning economically viable for organizations that could not absorb frontier API pricing at scale.

One of the clearest lessons from the Semgrep evaluation is that engineering scaffolding matters more than model selection. Semgrep's purpose-built pipeline significantly outperformed GLM-5.2 on the same task. Security teams investing in AI tooling will get more return from building strong orchestration, context management, and workflow design than from simply swapping one model for another.

5.2 Risks That Require an Updated Threat Model

The same accessibility that makes GLM-5.2 useful for defenders creates a distinct set of risks.

Safety guardrails can be removed from an open-weight model. Jailbreak techniques for using GLM-5.2 for offensive purposes were being shared on hacker forums within days of its release. An attacker running the model locally generates no API log, triggers no provider alert, and leaves no account trail. Detection frameworks that assumed some level of provider-side visibility over AI-powered attack tooling need to account for this.

The structural shift is significant. When attackers depended on commercial APIs, providers could monitor for suspicious usage patterns and terminate access. That mechanism does not exist for a model running on a local machine. Security teams should treat AI-powered vulnerability discovery as a capability that is now available to both sides, without the provider acting as a passive gatekeeper.


6. Key Takeaways: GLM-5.2, Mythos, and the Road Ahead

mythos-glm-5.2)

GLM-5.2 is the first open-weight model to achieve frontier-comparable results on specific cybersecurity investigation and vulnerability detection benchmarks, confirmed by independent evaluations from Semgrep and Graphistry.

Claude Mythos remains the most capable AI system for cybersecurity at full scope. Its autonomous zero-day discovery and exploit development capabilities represent a level of performance that no open-weight model has yet matched end to end.

The concern GLM-5.2 introduces is not that it surpasses Mythos across the board. It is that frontier-adjacent vulnerability discovery capability is now freely available with no guardrails, no monitoring, and no access controls. The gap that export controls were designed to maintain has become much harder to enforce.

For security teams and developers, the path forward is not waiting for access to the most advanced gated models. It is building the scanning pipelines, remediation workflows, disclosure processes, and hardened development practices that can keep pace with the volume and speed of AI-discovered vulnerabilities. Anthropic itself noted that hundreds of thousands of organizations will eventually need access to Mythos-class capabilities to address the coming challenge at scale.

The competitive timeline between open and closed AI models is compressing. GLM-5.2 reached comparable performance on specific tasks within roughly seven months of the frontier model it is being compared against. That compression will continue. Organizations that invest in the infrastructure and processes to handle AI-powered security work today will be far better positioned as these capabilities become increasingly widespread.

The era of AI-powered vulnerability discovery at scale is not approaching. It is already here.

Frequently Asked Questions

GLM-5.2 is an open-weight AI model developed by Zhipu AI and released in June 2026 under an MIT open-source license. It is significant for cybersecurity because independent evaluations by Semgrep and Graphistry confirmed it achieved frontier-comparable performance on real vulnerability detection tasks, including IDOR detection and blue team CTF investigations. Unlike closed frontier models, GLM-5.2 can be downloaded and run locally by anyone, making advanced AI-powered security scanning accessible without requiring API access or provider approval.

We provide AI solutions for startups, SMEs, and enterprises across a wide range of industries including healthcare, retail, ecommerce, manufacturing, logistics, finance, education, real estate, and professional services. Our solutions are tailored to each business's goals, workflows, and growth stage.

Claude Mythos is Anthropic's most advanced frontier AI model, designed specifically for high-level cybersecurity work. It identified thousands of zero-day vulnerabilities across every major operating system and browser during testing, developed working exploits in over 83% of cases on the first attempt, and scored 83.1% on the CyberGym vulnerability reproduction benchmark. GLM-5.2 matches Mythos on specific benchmark tasks such as IDOR detection and blue team investigations, but has not replicated Mythos's full autonomous zero-day discovery and exploit development capabilities end to end. The key difference is access: Mythos is restricted to vetted Project Glasswing partners, while GLM-5.2 is freely available to anyone.

Discover AI for Your Business

Curious how AI tools can improve your workflows and growth? Let’s explore solutions tailored to your vision.