Someone on your revenue operations team got tired of nagging account executives about CRM hygiene. So they wired up an agent. Salesforce has an MCP server, the model can call tools, and the workflow is obvious: take the meeting transcript, pull out the next steps, update the opportunity, log the activity, push a follow-up task. An afternoon of work, one API token in a .env file, and the thing runs.
It works. AEs stop complaining. The demo gets passed around. Within a week, two other teams want the same thing for Zendesk and Jira, and you have quietly become the owner of production Agentic AI inside the company.
Then it stops being an afternoon project. Not because the agent got worse, but because the moment it acts on behalf of other people, every shortcut that made the prototype fast turns into a question you cannot answer with a print() statement.
TL;DR
- You need an MCP runtime for your AI Agents when auth, permissions, audit logs, integrations, reuse, or risk ownership start moving out of the prototype phase.
- MCP standardizes tool connections, but it does not, by itself, solve production governance.
- An MCP runtime centralizes identity, policy, tool execution, and evidence so that you do not need to rebuild those layers from scratch for deploying AI agents in production
The wall is predictable, not a failure
You built the right thing. The prototype-first path is the correct first move. Prototypes are cheap to assemble once a model can call tools and an MCP server can expose capabilities, and a small team can tolerate a narrow happy path. Every team now running agents at scale started exactly where you are, with one workflow, one tenant, light usage, and a forgiving risk posture.
The first version works because it quietly cheats. The engineer who built it is the security boundary. They know which records the agent can touch, they wrote the prompt, and they hold the token. There is no question of “what should this agent be allowed to do,” because the answer is “whatever I, the builder, can do.” That assumption holds right up until the agent is doing things for people who are not you.
Six signs separate a working prototype from something that needs real infrastructure. None is exotic. You will recognize your own repo in most of them.
Sign 1: You’re writing more auth and login plumbing than agent logic
The identity layer: proving who is actually calling the tools.
You’ve crossed it when: your auth/ directory is bigger than your tools/ directory.
Look at your repository. The tools/ directory grew a sibling auth/ directory, and auth/ is now bigger. Standups have shifted from “how do we improve the agent” to “why did this user’s refresh token fail” and “which account is the agent using.” A new engineer’s first ticket is “add Slack,” and it takes two weeks.
Why it happens
Acting agents sit in the hard middle between a user and a downstream API, which forces you to own multi-user AI agent authentication and authorization mechanics you used to get for free, and enterprise APIs do not share an identity standard. Slack rotates access tokens every 12 hours; GitHub expires installation tokens in an hour and refresh tokens in six months; Microsoft Graph splits delegated from app-only access with its own consent model. Implementing one is a week. Implementing five, with refresh, rotation, revocation, and per-user storage, is a sustained quarter. Get the concurrency wrong and two threads refresh the same single-use token at once, the provider reads it as a replay attack, and the user is locked out.
How it plays out
Trace it through the Salesforce agent. The first version runs on one static admin token. Then the sales director wants updates recorded under the rep who was on the call, so you build per-user OAuth with a background worker to store, encrypt, and refresh tokens (Salesforce access tokens expire in two hours). Then security asks what happens when an AE leaves, so revocation has to tie into your IdP’s deprovisioning. Each request is reasonable. Together, they are an IAM client you never set out to build.
Bottom line: the prototype needed a login; the production agent needs an identity model, especially once enterprise teams expect SSO for AI agents to work like the rest of their software stack.
Sign 2: Your permissions are a growing pile of hand-maintained if-then rules
The policy layer: deciding what they’re allowed to do.
You’ve crossed it when: nobody can say what the agent is allowed to do without reading the code.
Sign 1 was authentication: proving who is asking. This is authorization, the harder half: deciding what they are allowed to do, and in production, agents usually mean a delegated authorization stack that evaluates the user, the agent, and the action together. It starts with one clean check, updates the record only if the signed-in rep can edit it, and then the rules multiply. Update Stage, but only with the “Pipeline Manager” permission set. Closed-won updates need manager approval. EMEA is exempt. SDRs can edit notes but not the advanced stages. Each is a defensible business need. Together, they are configuration hell, accreting in one file: permissions.py fills with branches and comments like # do NOT remove, breaks renewals team, and new permission requests take a sprint.
Why it happens
The problem is structural: authorization depends on subject, object, and context, and inline conditionals collapse those dimensions into procedural mush. NIST’s ABAC guidance exists for exactly this reason, and tools like Open Policy Agent externalize policy to keep it out of application code.
How it plays out
Salesforce is the cautionary tale. Its permission model, profiles, permission sets, sharing rules, field-level security, and more, is two decades of mature hand-maintained authorization. An agent re-implementing a slice of that in Python is starting the same journey with a fraction of the staff.
Bottom line: at first, if-statements are the fastest way to encode context; later, they are an undocumented policy system, and a single wrong branch has blast radius across every tenant the agent touches.
Sign 3: You need agent audit logs for every tool call
The evidence layer: reconstructing what actually happened.
You’ve crossed it when: you can’t reconstruct who did what after the fact.
Suppose the permission rules are right. You still cannot prove they were followed. The clearest version of this sign is a Slack thread:
“Hey, did the bot just close that opp?” “I think so?” “Can you check?” “The logs rolled over.”
That conversation is the finding. When something looks wrong, you need to answer fast: which run did it, who authorized it, what was the input, what changed downstream, and was there an approval? If you cannot, you do not have guardrails. You have an opinionated wrapper.
Why it happens
An auditable action comprises at least five facets that must be recorded together: the requesting user, the agent identity, the authorization decision, the input, and the resulting change. Ad-hoc logging captures one or two, and they live in different systems. Salesforce Field History has the state change but not the reasoning; the LLM trace has the reasoning but not the change; nothing correlates them. Guardrails are point-in-time controls; audit trails are durable evidence, and acting systems need both. Slack’s Audit Logs API gives you actor, action, entity, and context, but explicitly will not tell you whether the action was appropriate.
How it plays out
When finance flags a deal at quarter close because the amount was moved after the close date, you can see the new value but cannot determine who changed it, on whose behalf, or based on what input. And the moment the agent mutates regulated data the question stops being internal: HIPAA at 45 CFR §164.312(b) requires systems handling ePHI to record and examine activity.
Bottom line: “we have the LLM transcript” is not an answer an auditor accepts. What they need instead is audit logs and telemetry for every tool call: who requested it, which tool ran, what changed, and how the action was authorized.
Sign 4: Every new system multiplies the work instead of adding to it
The integration layer: running one action across many systems.
You’ve crossed it when: the fifth connector costs more than the first, not less.
Everything so far has been one agent against essentially one system. Then the roadmap arrives: “add Gmail, Calendar, Zendesk, Jira, Slack, and Salesforce.” You budget for six connectors and price them roughly equal. Instead you get six different auth models, scope vocabularies, rate-limit behaviors, schemas, pagination styles, and audit surfaces. Adding Slack should have been easier than adding Salesforce. It was not. The first integration took two weeks; the fifth took five.
Why it happens
You did not add six tools, you added six governance surfaces, and each one drags the earlier signs in behind it: another identity model to wire (Sign 1), another permission surface to encode (Sign 2), another audit stream to correlate (Sign 3). Every tool you bolt on imports a full instance of each. It gets worse when an agent composes across systems, because a single logical action (read from Calendar, look up in Salesforce, post to Slack) has to reconcile three identity propagations, three permission checks, three rate limits, and three failure modes within a single operation. This is the connector-count fallacy, and it is exactly the problem the MCP gateway pattern is meant to avoid.
How it plays out
The rate limits alone will stop a roadmap. Microsoft Graph caps you at four concurrent requests per mailbox, a ceiling that bites harder once Exchange Web Services retires on October 1, 2026 and its far roomier limit (27 connections) goes with it. Add Outlook so the agent can schedule follow-ups, and the first time it reads inbox threads while booking a meeting it trips that limit and starts collecting 429s. The roadmap stops while you build a centralized queue and rate limiter the prototype never needed.
Bottom line: one more tool is not additive; it multiplies against everything already connected.
Sign 5: Each new team rebuilds the same infrastructure
The reuse layer: amortizing the work across agents and teams.
You’ve crossed it when: the next team forks nothing and starts from zero.
Sign 4 was the cost of adding tools to one agent. This is the cost of adding agents to the company, the same multiplication seen from the other axis. The sales ops agent ships, after months of security clearance, token storage code, and custom audit logging. A month later the support team wants an agent that pulls Salesforce records when a Zendesk ticket opens. They look at the sales team’s repo and start over. The auth is entangled with one queueing model, one set of scopes, one audit sink. Their logging assumes Salesforce. Nothing lifts cleanly, so both teams now maintain parallel auth code, and the security team has reviewed two patterns for the same risk.
Why it happens
By the third agent (customer success wants a renewal-risk updater, finance wants an invoice assistant), you have three implementations of the same core layers (identity, policy, integration, evidence) and three separately approved patterns for the same risk. Put formally, you are solving an N × M problem by hand: N agents, each rebuilt against M systems. None of those layers is agent-specific, but in every repo they were written application-specific, so there is no interface to extract. A shared layer collapses the problem to N + M, where each connector is built once and every agent inherits it.
How it plays out
This is the canonical platform-engineering trigger, and the industry has run the play before. Spotify built Backstage because its engineers were drowning in fragmented tooling, and Netflix calls this same idea the “paved road.” An MCP runtime acts as this exact shared substrate for AI agents. By providing a centralized control plane and a shared registry for team-level access, the runtime ensures that identity, policy, evidence, and integrations are built once. Every new agent simply connects to the runtime and inherits this infrastructure, making the safe path the easy one.
Bottom line: copying the first agent feels faster, right up until every copy inherits a private auth stack, permission model, and audit story. A centralized MCP runtime collapses this into a single integration point that every new team and agent can safely reuse.
Sign 6: Sensitive or legacy systems are entering scope, and nobody wants to personally own the risk
The ownership layer: deciding who carries the risk.
You’ve crossed it when: the pull request to a sensitive system sits open because no one will approve it.
This sign is psychological before it is technical, and that is the point. You were fine letting the agent draft notes and update low-risk CRM fields. You are not fine pointing the same stack at payroll, refunds, or the ERP. A pull request to give it write access to NetSuite or Workday sits open. Reviewers comment but will not approve. The engineer asks security for sign-off; security asks the engineer. Nothing ships.
Why it happens
That hesitation is correct, and notice what it is not about. The earlier signs were about building the mechanics, and by now you have most of them. This one is about who answers for the outcome when those mechanics touch something irreversible. A Salesforce note is recoverable in minutes; a journal entry in NetSuite hits the general ledger. These systems carry formal control expectations: Dynamics 365 ties segregation of duties to SOX, IFRS, and FDA controls. “The agent probably did the right thing” is not part of the operating model there. Legacy systems sharpen it further, since they often lack what makes a bad write survivable: no fine-grained permissions, no auditable API, no transactional undo.
How it plays out
When the lead architect is asked to point the agent at a legacy financial database, the question is not “can I build the connection?” It is “if a malicious email steers this agent into the wrong write, the damage cannot be undone, and the name on the change is mine.” Blocking that deploy is a rational refusal to personally absorb an institutional risk. This is where an MCP runtime steps in. By providing features like mandatory out-of-band human approvals (“read, draft, and commit”), contextual access policy hooks, and immutable OpenTelemetry-compatible audit logs, the runtime shifts the burden of trust from the developer to secure, verifiable infrastructure.
Bottom line: when accountability exceeds what one person can absorb, the work requires institutional ownership through an MCP runtime. With versioned policy, retained audit logs, routed approvals, and credentials kept entirely out of the LLM execution environment, the engineer’s name is on the code, not on the risk of the decision.
The pattern behind the signs
These are not six unrelated problems. They are one problem wearing six masks. You set out to build an agent and ended up hand-building a runtime, one feature at a time, without the architecture to hold it together. The auth daemon, the growing permissions.py, the scattered logs, the per-connector rate limiter, the copy-pasted glue, the deploy nobody will approve: each is a piece of execution infrastructure that should exist once and apply to every agent, reinvented inside a single application instead. Identity, policy, evidence, integration, reuse, ownership: six names for the same missing layer.
An MCP runtime is that missing layer. Not a framework for building agents, and not a platform that hosts them. It is the standard execution layer agents act through, where those six concerns live once, as infrastructure, the same way a language runtime or a container runtime is not something application code opts into so much as the substrate it cannot act without. The agent proposes; the runtime authenticates the call, enforces policy, executes the tool, and records what happened.
Adopt one and your effort moves from building security boundaries to designing what the agent should actually do. The six concerns become properties of the layer rather than per-agent plumbing, and the next team inherits the safe path rather than rebuilding it. Arcade.dev, the MCP runtime, is built for exactly this. It delivers per-action authorization that evaluates the intersection of agent and user permissions at the moment of the call (with credentials kept out of the model), a catalog of 8,000+ agent-optimized MCP tools that translate intent into safe API calls instead of letting the model hallucinate parameters, and centralized lifecycle governance with an OpenTelemetry-compatible audit record per user per service. How you get a runtime, build or buy, is its own conversation.
Quick checklist: Have you outgrown the prototype?
You do not need a runtime the first time an agent works. You need one when the agent becomes important enough that the surrounding questions matter as much as the prompt.
Run the checklist against the agent you already have:
- It acts on behalf of more than one user.
- It uses per-user OAuth instead of one developer-owned token.
- It can write to systems of record, not just read from them.
- Permissions depend on role, team, region, record owner, approval state, or business context.
- You cannot reconstruct every tool call from request to downstream change.
- Adding a new connector means rebuilding auth, scopes, rate limits, retries, and audit behavior.
- A second team is copying the first agent rather than reusing the shared infrastructure.
- Sensitive systems such as payroll, refunds, ERP, finance, healthcare, and customer data are coming into scope.
- Security, legal, or compliance has started asking who approved the action, not just whether the code works.
- A pull request is stalled because everyone agrees the agent is useful, but nobody wants to personally own the risk.
If you checked one or two, you may still be in prototype territory. If you checked three or more, the agent is probably no longer the hard part. The missing layer around it is.
Bottom line: once the questions are about identity, permission, evidence, reuse, and ownership, you are no longer debugging an agent. You are discovering the runtime it needs.
You’ve crossed the threshold
If these signs sound like your standups, your repo, and your stalled pull requests, the conclusion is simple: you have outgrown the DIY approach. Not because you built it wrong, but because you built it well enough to hit the same wall that web apps hit before centralized identity, that deployments hit before CI/CD, and that infrastructure hit before container orchestration. The artifact that resolved each of those was the same shape every time: extract the execution layer. You are not late. You are exactly on time for the transition; every infrastructure category before this one has already been made.
How you get that runtime, whether you build or buy an MCP runtime, and how to evaluate the options, is the next conversation.
Frequently Asked Questions
What is an MCP runtime?
An MCP runtime is the governed execution layer for agents that use Model Context Protocol tools. It sits between the agent and the MCP servers it calls, handling identity, authorization, tool execution, credential isolation, policy enforcement, and audit logging.
Why do AI agents need a runtime?
AI agents need a runtime when they move from prototypes to production. MCP helps agents connect to tools, but teams still need a governed layer to decide who the agent is acting for, what it is allowed to do, how credentials are protected, and how each action is recorded.
Is MCP itself a runtime?
No. MCP standardizes how agents connect to tools and context. An MCP runtime governs what happens when those tools are used, including authorization, credential handling, policy checks, approvals, retries, rate limits, and audit trails.
When should a team use an MCP runtime?
A team should use an MCP runtime when an agent acts on behalf of multiple users, connects to sensitive systems, writes to systems of record, requires per-user OAuth, needs audit logs, or is being reused across multiple teams and workflows.
How is an MCP runtime different from an MCP server?
An MCP server exposes tools, resources, or prompts to an agent. An MCP runtime governs the execution of those tools in production. The server defines what is available. The runtime controls who can use it, under what policy, with which credentials, and with what audit record.
How is an MCP runtime different from an MCP gateway?
An MCP gateway primarily federates tools from multiple MCP servers into a single endpoint for simplified routing and single-URL configuration. While useful for connectivity, a gateway just routes requests. An MCP runtime is a complete execution layer that goes beyond routing to include delegated multi-user authorization, intent-level tool execution, contextual policy enforcement, and immutable audit logging. A gateway routes; a runtime executes, enforces, and audits.
How does an MCP runtime improve security?
An MCP runtime improves security by separating the agent from raw credentials, enforcing per-user and per-action authorization, limiting tool access, routing sensitive actions through policy checks, and recording what happened for every tool call.
Should companies build or buy an MCP runtime?
Build an MCP runtime only if your agent is single-user, your APIs are fully internal, or your agent infrastructure is your core product. For multi-user production agents that need OAuth, credential vaulting, permissions, audit logs, or SaaS integrations, buying a runtime usually lets the team ship faster while avoiding the need to own permanent infrastructure.


