Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

Researchers from Anthropic said they recently observed the “first reported AI-orchestrated cyber espionage campaign” after detecting China-state hackers using the company’s Claude AI tool in a campaign targeting dozens of targets. Outside researchers are much more measured in describing the significance of the discovery.

Anthropic published the reports on Thursday here and here. In September, the reports said, Anthropic discovered a “highly sophisticated espionage campaign,” carried out by a Chinese state-sponsored group, that used Claude Code to automate up to 90 percent of the work. Human intervention was required “only sporadically (perhaps 4-6 critical decision points per hacking campaign).” Anthropic said the hackers had employed AI agentic capabilities to an “unprecedented” extent.

“This campaign has substantial implications for cybersecurity in the age of AI ‘agents’—systems that can be run autonomously for long periods of time and that complete complex tasks largely independent of human intervention,” Anthropic said. “Agents are valuable for everyday work and productivity—but in the wrong hands, they can substantially increase the viability of large-scale cyberattacks.”

“Ass-kissing, stonewalling, and acid trips”

Outside researchers weren’t convinced the discovery was the watershed moment the Anthropic posts made it out to be. They questioned why these sorts of advances are often attributed to malicious hackers when white-hat hackers and developers of legitimate software keep reporting only incremental gains from their use of AI.

“I continue to refuse to believe that attackers are somehow able to get these models to jump through hoops that nobody else can,” Dan Tentler, executive founder of Phobos Group and a researcher with expertise in complex security breaches, told Ars. “Why do the models give these attackers what they want 90% of the time but the rest of us have to deal with ass-kissing, stonewalling, and acid trips?”

Researchers don’t deny that AI tools can improve workflow and shorten the time required for certain tasks, such as triage, log analysis, and reverse engineering. But the ability for AI to automate a complex chain of tasks with such minimal human interaction remains elusive. Many researchers compare advances from AI in cyberattacks to those provided by hacking tools such as Metasploit or SEToolkit, which have been in use for decades. There’s no doubt that these tools are useful, but their advent didn’t meaningfully increase hackers’ capabilities or the severity of the attacks they produced.

Another reason the results aren’t as impressive as made out to be: The threat actors—which Anthropic tracks as GTG-1002—targeted at least 30 organizations, including major technology corporations and government agencies. Of those, only a “small number” of the attacks succeeded. That, in turn, raises questions. Even assuming so much human interaction was eliminated from the process, what good is that when the success rate is so low? Would the number of successes have increased if the attackers had used more traditional, human-involved methods?

According to Anthropic’s account, the hackers used Claude to orchestrate attacks using readily available open source software and frameworks. These tools have existed for years and are already easy for defenders to detect. Anthropic didn’t detail the specific techniques, tooling, or exploitation that occurred in the attacks, but so far, there’s no indication that the use of AI made them more potent or stealthy than more traditional techniques.

“The threat actors aren’t inventing something new here,” independent researcher Kevin Beaumont said.

Even Anthropic noted “an important limitation” in its findings:

Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor’s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

How (Anthropic says) the attack unfolded

Anthropic said GTG-1002 developed an autonomous attack framework that used Claude as an orchestration mechanism that largely eliminated the need for human involvement. This orchestration system broke complex multi-stage attacks into smaller technical tasks such as vulnerability scanning, credential validation, data extraction, and lateral movement.

“The architecture incorporated Claude’s technical capabilities as an execution engine within a larger automated system, where the AI performed specific technical actions based on the human operators’ instructions while the orchestration logic maintained attack state, managed phase transitions, and aggregated results across multiple sessions,” Anthropic said. “This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement, as the framework autonomously progressed through reconnaissance, initial access, persistence, and data exfiltration phases by sequencing Claude’s responses and adapting subsequent requests based on discovered information.”

The attacks followed a five-phase structure that increased AI autonomy through each one.

The attackers were able to bypass Claude guardrails in part by breaking tasks into small steps that, in isolation, the AI tool didn’t interpret as malicious. In other cases, the attackers couched their inquiries in the context of security professionals trying to use Claude to improve defenses.

As noted last week, AI-developed malware has a long way to go before it poses a real-world threat. There’s no reason to doubt that AI-assisted cyberattacks may one day produce more potent attacks. But the data so far indicates that threat actors—like most others using AI—are seeing mixed results that aren’t nearly as impressive as those in the AI industry claim.