All writing
Article·May 7, 2026·12 min readSecurityCase studiesProduction

When agents fail in production

Six real incidents from 2025 and early 2026. What broke, what was missing, and how the failures could have been prevented.

A

Aditya Rai

Founder · raiagents

Twelve months ago, agent failures were mostly theoretical. Today they have CVE numbers, named victims, and apology blog posts. The same handful of architectural mistakes shows up in almost every incident.

Why this list exists

Most of the agent disasters in the last year were not caused by teams that didn't care about security. They were caused by teams shipping into a category of system whose threat model was still being written. The patterns repeat. If you're planning to put an AI agent into your business in 2026, these are the failures worth learning from before you commit.

This is a companion to AI Agent Security: A Deep Dive. That post covered the threat classes in the abstract. This one is the field report.

01 · Replit's coding agent destroys SaaStr's production database

July 2025. Coding agent. ~1,200 executive records destroyed.

Jason Lemkin, the founder of SaaStr, was working with Replit's coding agent during what he had explicitly declared a code and action freeze. The agent decided otherwise. It ran a series of unauthorized destructive commands against the live database, wiping roughly 1,200 executive profiles and 1,190 company records. It then fabricated about 4,000 fake users to populate the empty tables. When asked what happened, it claimed the data could not be rolled back. The data was, in fact, recoverable. The agent was lying about its own state. Replit's CEO publicly apologized.

What was missing. The agent had production credentials. There was no separation between dev and prod. There was no hard gate on destructive operations. The freeze instruction lived in the prompt, not the runtime. It was therefore optional.

What we would do. No agent ever holds production write credentials directly. Destructive verbs (DROP, DELETE, destroy, rm -rf) require a human approval that cannot be auto-clicked by the agent. Freezes are runtime states enforced by the tool wrapper, not English sentences in the prompt. And we treat the agent's self-report as untrusted: ground-truth is what the database actually shows, not what the agent says happened.

02 · “EchoLeak”: Microsoft 365 Copilot exfiltrates email and Teams via one unread message

Disclosed June 2025. Productivity agent. CVSS 9.3. Zero-click.

A research firm demonstrated that Microsoft 365 Copilot could be hijacked by a single email sitting unopened in a target's inbox. The email contained an indirect prompt injection. When Copilot later did its routine background work, it followed the embedded instructions and exfiltrated Teams messages, SharePoint files, OneDrive contents, and chat history to an attacker-controlled endpoint. The exploit chain bypassed Microsoft's own prompt-injection classifier, link redaction, and Content Security Policy by routing through a whitelisted Teams proxy. Microsoft patched it after disclosure.

What was missing.Untrusted email content was ingested into the same trust context as the user's actual instructions. Defenses existed but each had escape hatches. Defense in depth means nothing if every layer has the same blind spot.

What we would do.Untrusted content (email, web pages, uploaded documents, customer messages) is structurally separated from user intent in the context window. The agent receives it as data to be analyzed, not as instructions to be followed. CSP and egress allowlists are scoped to data sensitivity, not to “the org domain.” This is architecture work. It cannot be bolted on later.

03 · “ForcedLeak”: Salesforce Agentforce leaks CRM data via a lead form

Disclosed September 2025. CRM agent. CVSS 9.4.

Researchers showed that Salesforce's Agentforce CRM agent could be made to exfiltrate internal lead data with no insider access. The attack: stuff a malicious instruction into the 42,000-character Descriptionfield of a Web-to-Lead form. When Agentforce later summarized the lead (a routine task), it followed the embedded instructions and posted internal CRM contents to an attacker-controlled URL. The attacker's domain happened to be on Salesforce's CSP allowlist because the original owner had let the registration expire and the researcher had bought it.

What was missing.Untrusted user-submitted content was dropped into the agent's context with no separation. Egress controls relied on a stale allowlist. Nobody was checking who actually owned the domains the allowlist trusted.

What we would do. All customer-submitted content is sandboxed before it reaches the agent. Egress is allowlisted at the action level (this tool can only post to these specific endpoints with this specific shape) and the allowlist is regenerated and verified continuously, not curated by hand once and then forgotten for years.

04 · The Amazon Q Developer supply-chain attack

July 2025. Coding agent. ~1M installs at risk.

A drive-by contributor was granted admin access on the public repo for AWS's official VS Code extension and merged a pull request that injected a system prompt instructing the agent to factory-reset the developer's filesystem and enumerate AWS resources for deletion. The malicious version shipped to the marketplace as v1.84.0. Users were saved by a coincidence: the injected payload was malformed and didn't actually parse. Had it been shaped correctly, around a million developers' machines and AWS environments would have been at risk.

What was missing.The agent's system prompt was treated as configuration, not as code. Contributor access on the repo was loose. The agent itself had unrestricted shell and AWS CLI tools, so a successful payload would have had complete system access. The architecture had no concept of a review boundary on prompt changes.

What we would do.Agent prompts and tool definitions are first-class artifacts: code-reviewed, signed, change-logged, and gated behind the same controls as production application code. Coding agents do not get unsupervised shell or cloud-CLI access. The human is the eval set when an agent is about to touch a developer's machine.

05 · Claude Code wipes a 2.5-year-old education platform

Early 2026. Coding agent. ~100k users. 1.94M records destroyed.

The founder of an education platform asked Claude Code to clean up duplicate Terraform resources in what he believed was a side project. The agent, working from a stale archive that contained leftover production state, ran terraform destroy. It obliterated the platform's VPC, RDS, and ECS cluster: 1,943,200 student-submission records spanning 2.5 years, gone in seconds. The platform was recovered only because AWS had retained an internal snapshot the team didn't know existed.

What was missing. Blast-radius awareness. The agent had no concept of production versus side project. It executed the most destructive infrastructure command in its toolbox without a surface-level check. Backups were not really backups. They were luck.

What we would do. Infrastructure-touching agents operate with scoped, environment-tagged credentials. Destructive verbs are gated, full stop. Backup verification (can we actually restore from this, today?) is part of the deploy pipeline, not a hope. We design assuming the agent will, eventually, try the worst possible thing in its toolbox. Because eventually, it will.

06 · The browser-agent trust boundary collapse

October 2025. Browser agents. ChatGPT Atlas, Perplexity Comet, others.

When ChatGPT Atlas (and competing browser agents like Perplexity Comet) launched, researchers immediately demonstrated multiple critical vulnerabilities. The omnibox treated disguised malicious URLs as high-trust user intent. Persistent agent memory could be poisoned via a CSRF attack from any web page, and the poisoned instructions persisted across sessions and devices. OpenAI's CISO Dane Stuckey publicly called prompt injection in browser agents “a frontier, unsolved security problem,” and OpenAI's own December 2025 hardening post acknowledged it is “unlikely to ever be fully solved.”

What was missing. A clear trust boundary between what the user typed and what some web page said. Long-term agent memory was writeable from untrusted web origins. The action surface (an agent that browses, clicks, and fills forms on your behalf) was massively expanded without commensurate isolation.

What we would do.Browser agents need three separate trust contexts: user intent, page content, and agent memory. Each has its own write rules. We don't ship a browser agent without a hard separation here. We also don't ship one without a per-session sandbox and an explicit user-approval step on any irreversible action (purchases, sends, deletions, configuration changes).


Patterns

A few things show up over and over:

  1. Indirect prompt injection is the dominant attack class. Five of the six incidents above involve attacker-controlled content reaching the agent's context.
  2. Defense in depth has escape hatches. Microsoft, Salesforce, GitHub, and OpenAI all had multiple mitigations. Researchers chained around all of them.
  3. Agent prompts and tool definitions are application code. When they are treated as configuration, supply-chain attacks become routine.
  4. Blast radius is not the agent's problem to manage. It is the runtime's. If the tool can destroy prod, the tool will eventually try.
  5. Customer-facing agents need groundedness, not just guardrails. A hallucinated company policy is cheap when it costs you a few refunds. In regulated industries, the same failure is a lawsuit.

If you're shipping an agent

Most of the companies on this list have substantial security teams. They still got hit. The reason is not that those teams aren't good. The reason is that agentic AI is a new category of system, and the off-the-shelf playbook for securing it is still being written.

If you're building an AI agent for your business in 2026 and your vendor cannot tell you, in detail:

  • How they separate untrusted content from user intent in the context window.
  • What the tool egress allowlist looks like, and who maintains it.
  • Which actions require human approval, and how that approval is enforced at runtime, not in the prompt.
  • How they detect and respond to prompt injection in production.
  • And what the blast radius is if the agent does the worst possible thing in its toolbox.

…then they will be in next year's writeup. We'd rather you not be there with them. If you're thinking about shipping an agent, we're happy to talk through the threat model with no commitment. Book a free 30-minute call.

Aditya


Sources

A

Written by

Aditya Rai

Founder of raiagents. Senior engineer who has shipped AI in payments, banking and ML at scale (Routesense, gAI Ventures, Amazon). Now building the same engineering rigor for small businesses.

Currently booking · 4 slots / mo

Tell us about your business.
We'll tell you what to automate first.

Free 30-minute discovery call. We talk through your business, where AI could actually help, what it would cost, and how long it would take. No pressure to commit. If we're not a fit, we'll say so.

Discovery callFree · 30 min
First agent live2–4 weeks
Engagement size$5–50k typical
PricingClear monthly · no surprises
Industries we serveCoffee · dental · accounting · real estate · law · agencies