50x Faster Than Human Experts — Setting a New Benchmark for AI-Driven Offense
San Francisco, CA – August 14, 2025 — Horizon3.ai today announced that NodeZero®, its autonomous penetration testing platform, is the first AI to fully solve the Game of Active Directory (GOAD) — a respected benchmark for Active Directory exploitation — completing the challenge in just 14 minutes.
GOAD, developed by Orange Cyberdefense, simulates a realistic multi-domain enterprise network with the same trust abuses, misconfigurations, and security controls attackers exploit in the wild. Solving it requires chaining reconnaissance, credential abuse, privilege escalation, lateral movement, and persistence across multiple hosts and domains.
Recent Carnegie Mellon University research underscores how difficult this is: state-of-the-art LLMs like GPT-4o, Gemini 2.5 Pro, and Sonnet 3.7 — even with advanced prompting frameworks — failed to reliably execute multi-host intrusions, capturing less than 30% of attack graph states in labs capped at 50 hosts.
Why GOAD is Hard
For both humans and algorithms, GOAD is a stress test of scale, reasoning, and persistence. Attack paths are not linear — they require maintaining multi-hop memory across dozens of steps, adapting execution priorities based on partial successes, and exploiting inter-domain trust boundaries under realistic constraints.
- For expert human pentesters: Completing GOAD typically takes 12–16 hours of sustained effort, deep AD exploitation expertise, and careful sequencing of tools and tactics.
- For algorithms and LLMs: The complexity forces reasoning systems to juggle conditional execution, state tracking, and dynamic reprioritization — capabilities where current AI models fail.
NodeZero’s solve time of 14 minutes is 50× faster than an expert human, with perfect execution of the full attack chain from initial foothold to complete domain compromise.
NodeZero operates in a different league — applying the same architecture that solved GOAD in minutes to production-scale networks across industries. Its campaigns directly map to the breach patterns documented in the 2025 Verizon DBIR, IBM X-Force, and Mandiant M-Trends reports:
- Initial access via public-facing vulnerabilities and valid accounts — Verizon: ~20% of breaches start with exploited vulns, 22% with stolen credentials; IBM: 40% involve public-facing apps, 27% valid cloud accounts. NodeZero safely exploits these same weaknesses in live environments.
- Identity and directory abuse — Mandiant: Account Manipulation (T1098) in 19.9% of cases, External Remote Services (T1133) in 22.4%. GOAD’s domain trusts, Kerberoasting, and SSO abuse chains mirror these tactics, and NodeZero executes them autonomously.
- Lateral movement and privilege escalation — IBM and Mandiant highlight the use of living-off-the-land techniques post-access; NodeZero’s graph-driven orchestration prioritizes these pivots to reach high-value targets.
- Persistence and re-entry — Mandiant: backdoors in 35% of intrusions, often outpacing ransomware loaders. NodeZero validates whether these footholds can be established and detected.
In the NSA’s Continuous Autonomous Penetration Testing (CAPT) program, NodeZero has already shown this capability at national scale:
- Expanded coverage from 200 to 1,000 defense contractors.
- Discovered 50,000+ vulnerabilities, with 70% remediated — many within days.
- Achieved domain compromise in as little as 77 seconds and uncovered catastrophic exposures in under five minutes.
- Gained access to sensitive CAD drawings of Aircraft Carriers and Nuclear Submarines in less than 5 minutes
“GOAD is an excellent benchmark, but its real value is how closely it reflects what’s happening in the wild,” said Snehal Antani, CEO and Co-Founder of Horizon3.ai. “When you can solve GOAD in minutes, and then turn that same capability loose in production networks — aligned to the exact tactics in the DBIR, IBM, and Mandiant reports — you’re not just testing security, you’re closing the gap between how attackers operate and how defenders respond.”
While competitors make bold marketing claims about being “#1,” NodeZero delivers proof — in the lab and in live, complex environments — at a scale no other platform has matched.
Learn More About NodeZero vs. GOAD.
About Horizon3.ai
Horizon3.ai empowers organizations to continuously verify their security posture with NodeZero®, the industry’s leading autonomous pentesting platform. Built to think and act like an attacker — but operate safely in production — NodeZero identifies exploitable weaknesses, prioritizes fixes based on real-world impact, and verifies remediation at scale. Customers across manufacturing, healthcare, finance, and national security rely on NodeZero to reduce risk and accelerate security outcomes.
Follow Horizon3.ai on LinkedIn and X.
Horizon3.ai Media Contact
Brittney Blanchard
Highwire
horizon3.ai.pr@teamhighwire.com