Anthropic's Managed Agents: What's Missing

Anthropic’s engineering team recently published a detailed architecture post called “Scaling Managed Agents: Decoupling the Brain from the Hands.” It’s excellent work that confirms what every team shipping AI agents already knows: infrastructure around the model is the hard part, not the model itself.

Here’s what the post says, why it matters, and where a critical gap remains.

What Anthropic Built

Anthropic’s Managed Agents is a hosted service that runs long-horizon Claude agents on your behalf. The core insight is an analogy to operating systems: just as OSes virtualized hardware into stable abstractions (process, file, read()) that outlasted any particular disk or CPU, Managed Agents virtualize the components of an agent into three interfaces:

Session — a durable, append-only log of everything that happened, stored outside both Claude’s context window and the harness. This means context survives crashes and can be queried programmatically.

Harness — the loop that calls Claude, routes tool calls, and manages context engineering. It’s stateless. If it crashes, a new one replays the session log and picks up where it left off.

Sandbox — an isolated execution environment where Claude runs code and edits files.

The key architectural move is decoupling these three things. In the old design, everything lived in one container — a “pet” you couldn’t afford to lose. When it failed, the session died with it. By separating the brain from the hands, each component becomes a Lego brick: replaceable, independently scalable and debuggable. The proof is in the pudding, the p50 time-to-first-token dropped ~60%, while p95 dropped over 90%.

They also made a smart security decision: credentials never enter the sandbox. OAuth tokens live in a vault, and a proxy handles authenticated calls on the agent’s behalf. Claude’s generated code can never exfiltrate secrets because the secrets simply aren’t there.

What Problem This Actually Solves

Anthropic is solving the agent orchestration problem: how do you reliably run a model in a loop over long time horizons, survive failures, manage context that exceeds the window, and scale to many concurrent sessions? It’s infrastructure for the “brain,” the reasoning engine and the plumbing that keeps it running.

This is genuinely hard, and the post is a strong contribution. But notice what’s conspicuously underspecified: the hands.

The post defines a hand as anything that implements execute(name, input) → string. That’s it. What the hands actually connect to, how they authenticate as a specific user, how you govern what tools are available to which agents, how you manage OAuth flows across dozens of SaaS providers for thousands of end users… that’s left as an exercise for the reader.

And that exercise is enormous.

What Arcade.dev Solves

Arcade.dev is an MCP runtime, the production infrastructure for those “hands” that Anthropic abstracts away with a function signature.

If Anthropic’s Managed Agents answers “how do I keep an agent’s brain running reliably at scale,” Arcade answers the complementary question: “how does that agent securely take real actions, as real users, across real business systems?”

Three specific jobs to be done:

1. Per-user authorization at scale. When your agent sends an email, it can’t use a service account. It needs to act as that specific user, with that user’s OAuth token, scoped to that user’s permissions. Arcade handles the full OAuth lifecycle (token acquisition, refresh, storage, scoping) across providers like Google, Slack, Salesforce, and GitHub. It integrates with your existing identity provider. This is exactly the credential management problem Anthropic acknowledged (“tokens live in a vault, a proxy handles calls”), but didn’t ship a general solution for.

2. Agent-optimized tools, not API wrappers. Raw API surfaces are designed for developers, not LLMs. They have pagination, complex schemas, and ambiguous error codes. Arcade provides a catalog of MCP tools built specifically for agent consumption, providing higher reliability, lower token cost, and better structured outputs. You can also build custom tools using their open-source framework with OAuth and evals built in.

3. Governance and deployment flexibility. Enterprises need to know which agents can call which tools, see an audit trail, and deploy in their VPC or air-gapped environment. Arcade provides lifecycle governance for visibility and control over every tool and agent in the organization, with deployment options ranging from cloud to fully on-premises.

How They Fit Together

These aren’t competing products. They’re complementary layers. Anthropic built the OS for the agent’s brain. Arcade built the runtime for the agent’s hands.

Anthropic’s own architecture requires something like Arcade. Their execute(name, input) → string interface is deliberately tool-agnostic. As they say, “the harness doesn’t care whether the sandbox is a container, a phone, or a Pokémon emulator.” But when an enterprise needs that agent to book a meeting on a user’s Google Calendar, update a Salesforce opportunity, and post a summary to Slack, someone has to build and run that infrastructure. That means handling all of the authenticated end user with proper scoping, token refresh, and audit logging.

That’s Arcade. The brain is spoken for. The hands need a runtime. Arcade is that runtime.

Anthropic Just Told You How Hard Agents Are. They Left Out the Hardest Part.

What Anthropic Built

What Problem This Actually Solves

What Arcade.dev Solves

How They Fit Together

RECENT ARTICLES

Welcome to Arcade Labs

Microsoft Build 2026: What Was Announced, and What It Means for the Agent Stack

Opus 4.8 just dropped. But can it actually do the work?

Get early access to Arcade, and start building now.