The Red Team That Was Itself Compromisable
On June 23, 2026, a paper titled "Red-Teaming the Agentic Red-Team" appeared on arXiv. The authors are Dario Pasquini, Michal Bazyli, Taras Fedynyshyn, and Artem Sorokin. The paper's premise is stated without qualification in its opening: the use of agentic systems to perform offensive security operations has moved from a theoretical possibility to a commoditized capability. The community has focused on making those tools more capable. The paper focuses on something the community has not: whether those tools are themselves secure.
The answer is that they are not. The June 23 preprint presents the first in-depth security analysis of the most widely used agentic systems for offensive security operations. The analysis shows that most of these tools share common design flaws. Those flaws enable an active adversary to exfiltrate API keys, establish persistent footholds, and fully compromise the operator's machine. The last item holds even when the agent operates inside a sandboxed container.
That final clause is the load-bearing sentence. The sandbox is the institutional boundary organizations draw around agentic tooling when they deploy it inside a security-sensitive environment. The June 23 paper shows the boundary does not hold.
The kill chain
To support its analysis, the June 23 arXiv preprint by Pasquini and colleagues introduces a full cyber kill chain for agentic offensive-security systems. The kill chain captures five stages in sequence: initial LLM manipulation, lateral movement, persistence, guardrail bypass, and sandbox escape.
The first stage is the entry point. LLM manipulation is the act of influencing the agent's reasoning through its inputs - through the content it ingests, the instructions it receives, or the data it retrieves. An agentic red-team tool that browses a target environment, reads files, executes code, and reports results is exactly the kind of system that can be manipulated through what it reads. The attacker does not need a separate channel into the agent. The agent's own tool-use loop is the channel.
From initial manipulation, the chain moves to lateral movement. An agent that has been manipulated into executing adversary-controlled instructions has the same tool access and credential scope it had when the operator deployed it for legitimate purposes. Lateral movement in this context is the agent using its authorized access to reach systems or data outside the intended scope of its task. The agent does not know its scope has been violated. Its reasoning loop has been hijacked.
Persistence is the third stage. An agent that has moved laterally can establish persistence through the same mechanisms any attacker would use: writing to startup scripts, modifying configuration files, registering scheduled tasks. The difference is that the agent performs these operations using its authorized tool access. The operational log may show the agent doing exactly what an agent is supposed to do - using its tools - while the payload is adversary-controlled.
Guardrail bypass is the fourth stage. Agentic offensive-security tools typically carry behavioral constraints: they are told not to operate outside a defined scope, not to exfiltrate data, not to persist. Those guardrails are instructions. They live in the same context the attacker has already manipulated. The June 23 analysis shows that guardrails implemented as model-level instructions do not constitute a control surface. They are content. Content can be overridden by other content. This is not a novel finding in isolation - the "Agent Security is a Systems Problem" paper from May 2026 made the same architectural observation. The June 23 paper makes it concrete against deployed tooling in a specific operational class.
Sandbox escape is the fifth stage and the structural terminus. The sandboxed container is the physical isolation boundary. The June 23 analysis shows that the kill chain, traversed from LLM manipulation through persistence and guardrail bypass, produces a foothold that can reach outside the container. The sandbox does not contain the kill chain. It contains the agent's process. The kill chain is not a process. It is a sequence of operations the agent performs using its legitimate access, while under adversary control.
Governance reading
The June 23 paper is the most direct challenge in the run-27 slate to the assumption that agentic AI tools can be deployed safely inside a bounded perimeter. The bounded-perimeter model is the operational form of the CISA Five Categories (C1-C5) approach: scope the agent's privilege, constrain its tool access, sandbox its execution environment, and log its behavior. The June 23 paper shows that an adversary who can reach the content the agent ingests can traverse the perimeter from inside.
The Anthropic Frontier Red Team MITRE mapping (G31) treated the agent as the instrument of the attacker. The June 23 paper treats the agent as the target. If the agent is the instrument, the governance question is how to prevent it from being used for offensive purposes. If the agent is the target, the governance question is how to prevent the attacker from hijacking an agent already authorized to operate in a sensitive environment. The frame has inverted.
The defenders who followed the NCSC NZ assume-compromise guidance of June 18 and deployed agentic tooling to perform supply chain analysis or vulnerability scanning have created a new entry point. The agent deployed to test their posture is now itself a target. Its credential scope, tool access, and operator-granted trust are the assets an adversary wants. The June 23 analysis shows how to reach them.
The Five Eyes joint declaration of June 22 stated that AI lowers barriers for malicious actors and increases attack speed. The June 23 paper is the operational illustration of that statement applied to the defensive tooling built to respond to it. The tool an organization deploys to test resistance to AI-assisted attack is itself vulnerable to AI-assisted attack.
What composes with this
The June 23 paper composes with the prior arc at two structural joints.
The Agent Security is a Systems Problem paper from May 2026 (P6) argued that the model is an untrusted component and security must live in the surrounding system through isolation, mediated tool access, and instruction-versus-data separation. The June 23 kill chain is a direct empirical test of that principle. The widely deployed offensive tools did not implement instruction-versus-data separation. The LLM manipulation stage depends on that failure. The systems-problem framing predicted the vulnerability; the June 23 analysis confirmed it in deployed systems.
The Agent Meltdowns paper from May 2026 (P7) showed that agents do not reliably detect or report their own failures. An agent manipulated into executing adversary-controlled instructions does not recognize the manipulation as a failure. It continues to log normal-looking operations. The operator does not know. This is not a consequence of the June 23 attack technique - it is a precondition for the technique's effectiveness. An agent that could detect and report LLM manipulation would close the first stage of the kill chain.
The CISA Five Categories (C1-C5) name privilege escalation, tool misuse, context poisoning, behavioral unpredictability, and accountability gaps. The June 23 kill chain instantiates all five in order. LLM manipulation is context poisoning. Lateral movement is privilege escalation through authorized access. Persistence is tool misuse for adversary goals. Guardrail bypass is behavioral unpredictability at the constraint layer. Sandbox escape is the accountability gap made physical: the agent's actions crossed a boundary that no log records as a boundary crossing.
The symmetric defender survey that appeared one day earlier mapped 16 cases where agentic AI changes which defensive problems are tractable. The June 23 paper maps the cases where those agentic defenders are themselves tractable targets. The same properties that make an agentic system useful for supply chain analysis - tool access, multi-step reasoning, operation across heterogeneous environments - are the properties an adversary exploits when traversing the kill chain.
What remains on the table
- Guardrails implemented as model-level instructions do not constitute a control surface if they live in the same context the attacker has already manipulated. What is the minimum architectural form a guardrail must take to resist manipulation through the same channel as the agent's task content?
- The sandbox does not contain the kill chain because the kill chain uses the agent's authorized operations. What boundary specification - at the tool-access layer, the credential-scoping layer, or the execution environment - would prevent traversal from LLM manipulation to sandbox escape while preserving the tool's operational utility?
- The NCSC NZ assume-compromise posture says to compress the post-compromise window. The June 23 kill chain shows the compromised entity can be the tool deployed to detect compromise. What does an assume-compromise posture look like when applied to the detection tool itself?
- The CISA Five Categories (C1-C5) accountability requirement says an agentic system's actions must be interrogable after the fact. The June 23 kill chain shows that adversary-controlled operations are logged as normal authorized operations. Does the current accountability specification distinguish between the two? If not, what would that distinction require at the log layer?
The substrate the policy depends on is the substrate the policy has not yet specified.