When Good Tools Fail: Making MCP Durable With Temporal

Melissa Herrera is a Senior Developer Advocate at Temporal who previously built multi-agent systems at DataStax and Langflow. She’s now making the case that the fragility of complex agentic tool-calling chains is a solved problem if you treat your tools as workflows rather than fire-and-forget functions.

This article is adapted from an MCP MVP interview with Melissa Herrera, Senior Developer Advocate at Temporal.

Most agentic workflows are doomed to failure because of the mathematical inevitability of error compounding.

If each step in an agentic workflow succeeds 95% of the time, a five-step chain succeeds only 77% of the time. A twenty-step workflow drops to 36%. And these failures can be hard to spot because agents are good at producing plausible-looking but wrong outputs that subsequent steps build on, a pattern researchers call error amplification. The longer the chain, the more certain the collapse.

Now apply this to MCP tools in production. Your agent calls a flight booking tool, which in turn calls a payment API that times out. The chain falls over. You’re left in an unrecoverable state with half-charged credit cards, phantom bookings, calendar invites for flights without tickets (true story).

Melissa Herrera, Senior Developer Advocate at Temporal, thinks the answer has been sitting in plain sight. Distributed systems engineers solved this problem before the AI boom. Temporal’s Durable Execution model gives your tools checkpoints, automatic retries, and self-healing—turning compounding failures into recoverable ones.

Drawing from her vast experience with video gaming, including Super Mario Bros., Melissa explained to RL Nabors during an interview for Arcade’s MCP MVP series, “If you hit a Goomba in your code, Temporal ensures that you go back to your last savepoint, not to the start of the game.”

What durability actually means

Durability, or Durable Execution, means ensuring your code completes, even when failures occur along the way. If any part of the process breaks, the system recovers from where it left off. This preserves progress and state, and it is not a new concept.

Temporal has been solving distributed systems problems since 2019, well before the current AI wave. But the pattern is newly relevant because AI agents have exactly the same failure profile as any other distributed system: they make network calls, they depend on external APIs, they chain together multiple non-deterministic steps. The only difference is that instead of microservices calling microservices, you have an LLM calling tools that call APIs.

“All of these patterns and distributed system problems are just emerging again in a different form,” Melissa said.

Temporal’s three primitives

Temporal’s architecture consists of three building blocks: Workflows, Activities, and Workers.

A Workflow is the orchestrator. It defines what should happen from start to finish. Workflows are deterministic, meaning if given the same inputs, they produce the same outputs. This is what makes replay and recovery possible.

Activities are where the non-deterministic work happens: API calls, database queries, LLM inferences, tool executions. Anything that could fail lives in an Activity. When an Activity fails, Temporal automatically retries with configurable backoff strategies. The Workflow doesn’t need to know or care about the failure—it just sees the Activity eventually succeed (or reports a permanent failure after exhausting retries).

Workers execute your code. They poll Temporal for Tasks, run your Workflows and Activities, and report results back. They’re stateless, too. If a Worker goes down, another picks up where it left off.

Write your business logic as a Workflow, put your side effects in Activities. When your code execution is interrupted, it resumes from its last good checkpoint.

MCP tools as Temporal Workflows

To show what this looks like with MCP, Melissa demoed a weather tool connected to Claude. When she asked Claude for the weather in New York, the request flowed through an MCP server where each tool was backed by a Temporal Workflow. In Temporal’s UI, you could see the Workflow kick off, the Activity call the National Weather Service API, and the result flow back to Claude.To demonstrate retry behavior, Melissa asked for the weather in London, a city outside the National Weather Service’s supported region. This is an important distinction: Temporal is built to handle transient failures like network blips, timeouts, or a temporarily unavailable API — things that might succeed if you just try again.

But this was a permanent failure: no amount of retrying would make the National Weather Service support a non-US city. Temporal didn’t know that upfront, so it did exactly what it was designed to do: it retried automatically 19 times before concluding the failure wasn’t going away. Each attempt was logged, visible, and inspectable in Temporal’s UI.

When it finally gave up, it failed gracefully with a clear error rather than silently losing context or crashing mid-process. That’s Temporal working correctly. The retry limit is configurable, meaning you as the developer are always in control of how persistent your Workflow should be before standing down.

To implement, you define your MCP tools on an MCP server, then wrap the tool logic in Temporal’s Workflow and Activity decorators. The MCP server code barely changes. You’re adding a @workflow decorator and moving your API calls into @activity functions.

Signals, Queries, and human-in-the-loop

Temporal has two more primitives that become critical for long-running agents: Signals and Queries.

A Signal lets you send information into a running Workflow from the outside. For agents, this is the human-in-the-loop mechanism. Imagine an agent booking travel on your behalf. It finds flights, picks one, and before it charges your card, it sends a Signal asking for confirmation. The Workflow pauses, waits for your approval, then resumes. If you reject, it backtracks to the search step.

A Query lets you inspect the state of a running Workflow without interrupting it. How far along is my agent? What tools has it called? What’s the current state of the booking? You can answer these questions at any time without affecting execution.

“These long-running Workflows and long-running agents need something to keep them in check,” Melissa said. “A Signal allows us to give more information to our agent and maybe tell it to divert. So your agent’s not just running off willy-nilly.”

RL connected this to a broader pattern: the future of human-agent interaction isn’t micromanagement. It’s more like managing a toddler. The agent runs autonomously, but occasionally checks in with the human. Signals and Queries make that possible without breaking the Workflow’s continuity.

Agents as tools, tools as agents

Agents can be exposed as MCP tools for other agents to call. The MCP specification supports them, though the pattern hasn’t been widely adopted yet.

Melissa, who previously built multi-agent systems at Langflow, sees this as the natural architecture for complex agentic workflows. You have an orchestrator agent with several specialized sub-agents in its toolbox—one for flight booking, one for calendar management, one for email. Each sub-agent is its own MCP tool, backed by its own Temporal Workflow. If the flight-booking agent fails, only that Workflow retries. The orchestrator agent doesn’t lose its conversation state.

“The line between agents and tools is starting to blur,” Melissa said. “Any agent that is specialized in something could also be a tool. And when you put each one in its own Temporal Workflow, they’re all durable by default.”

This is where Temporal’s architecture pays compound dividends. Instead of one giant, fragile workflow, you get a tree of independently durable Workflows. Failures are isolated. Retries are targeted. And the whole system is observable through Temporal’s UI.

Everything old is new again

The conversation took a philosophical turn when RL drew a parallel between the distributed systems patterns Temporal solves and the terminal-mainframe architecture of the 1980s. Cloud computing is, in a sense, just terminals connecting to someone else’s computer. When processing demands exceed local capacity, computation moves off-device, and all the networking, reliability, and state management problems come roaring back.

“Everything old is new again,” RL said. “If you start at one point and walk around a tree, you’ll come back to the same place you started. And tech is much like that.”

Melissa agreed. AI agent engineers are rediscovering the same distributed systems challenges that engineers have been solving for decades. The difference is that the “distributed system” now includes an LLM that might do something unexpected. Which, if anything, makes durability more important than ever.

Try it yourself

The Temporal AI Cookbook has the exact demo Melissa showed, plus additional recipes for building durable agents with Temporal. Start there.

For a deeper dive into the architecture, read Building Long-Running Interactive MCP Tools with Temporal and Durable MCP: Using Temporal to Give Agentic Systems Superpowers on the Temporal blog.

If you’re in San Francisco this May, Replay 2026 runs May 5–7 at the Moscone Center. Speakers from Instacart, OpenAI, Alaska Airlines, and more. Use code MELISSA85 for 85% off your ticket.

Temporal’s official SDKs support seven programming languages. Get started at temporal.io or sign up for Temporal Cloud with $1,000 in free credits. For questions and discussion, join the #topic-ai channel on Temporal Community Slack.

Follow Melissa on LinkedIn and on GitHub.

MCP MVP is a video series from Arcade.dev with RL Nabors that spotlights the builders shaping the agentic ecosystem. Watch the full interview with Melissa Herrera →