AX Is the New DX: ElevenLabs on Agent Skills & MCP

Paul Asjes has been building SDKs at Stripe and WorkOS for years. Now, as DevX Lead at ElevenLabs, he’s rethinking what developer education means when the learner is a developer’s agent!

This article is adapted from an interview with Paul Asjes, DevX Lead at ElevenLabs.

Something worrying every developer tooling company right now is that the models agents run on are trained on a set of data fixed in time, meaning they can perfectly innocently generate code using out-of-date knowledge. An engineer’s agent might generate code against an API surface that hasn’t existed for three versions, giving them a broken experience and moving on to the next tool. In short, the agent’s experience is now the developer’s experience.

Paul Asjes, DevX Lead at ElevenLabs, has been solving this problem with a three-part approach: skills that teach agents how to reason about your product, MCP tools that give them deterministic actions to execute, and MCP apps that give developers plug-and-play interfaces to present to users. “[My] goal is not so much developer education as much as it is agent education,” Paul told RL Nabors in a recent interview for Arcade’s MCP MVP series. “Because the agents are the ones writing a lot of the code.”

Paul has spent years at Stripe and WorkOS, building SDKs and crafting the kind of documentation that humans read. Now the audience has changed. Here’s what he’s learned about building for the new one.

The knowledge gap problem

When an agent doesn’t know how to do something, it rarely seeks to educate itself. Because of how LLMs are trained, they make guesses.

Paul referenced research from OpenAI that found LLMs approach questions like a multiple-choice exam. Just like taking the SAT or other standardized tests in American high schools, it is statistically better for the model to guess an answer than to give no answer. No answer is always wrong, while a guess might be right. This is fundamental to how current models work and not a bug that can be patched or trained out.

For API companies, this means an agent that hasn’t been trained on your latest SDK will confidently generate code against an API surface that may not exist anymore. Anyone who’s asked Claude to build something with React and gotten React 18 patterns back (despite React 19 being current) has experienced this pain.

The industry has tried a few approaches. llms.txt lets documentation sites serve raw markdown for agents to ingest. MCP servers give agents deterministic tools to call. But Paul and his team found something counterintuitive: giving an agent a giant data dump of your entire documentation actually makes it perform worse than giving it a carefully curated subset.

Tessl, a company that evaluates agent skills, confirmed this finding in an interview with its CEO. Agents with access to concise, well-curated instructions outperformed agents given comprehensive documentation. There is such a thing as too much information for an LLM.

Skills: documentation for agents

An agent skill is a markdown file with concise instructions optimized for agents to follow. If that sounds underwhelming, Paul agrees, “It kind of shocks me that we didn’t think of this before.”

The Agent Skills format was originally developed by Anthropic and now adopted by Microsoft, OpenAI, GitHub, and others. It standardizes how these instructions are packaged. ElevenLabs has published a set of skills covering their core products: text-to-speech, speech-to-text, voice agents, sound effects, and API key setup.

What makes skills different from regular documentation is both what they include and what they leave out. Skills explicitly tell agents what not to do: “do NOT hard code the API key!”

Agents, unlike human developers, won’t infer that from context or prior experience. Skills are written for a reader who is brilliant at following instructions and terrible at judgment.

Tools: deterministic function calling for agents

Paul draws a clean line between skills and MCP tools: “Skills teach the agent how to do a task. MCP tools give the agent deterministic actions to execute.” An MCP tool says, “call this function with this JSON schema.” A skill says, “here’s how to think about building a text-to-speech integration, here are the steps, here are the pitfalls.” One handles execution; the other handles reasoning.

MCP Tools and Agent Tools are complementary. An MCP tool can prevent an agent from hallucinating the wrong API endpoint, but it can’t prevent the agent from passing nonsensical parameters. A skill can teach the agent what a sensible request looks like, but the agent still needs to execute it.

MCP apps: UI for the end user

ElevenLabs has also been building for the other side of agent experience. Paul built the ElevenLabs MCP Player, an MCP app available in Claude Desktop’s extensions library. It’s an MCP server with a React-based UI, so when Claude generates audio for you, instead of just returning a file path, it displays an audio player. It includes controls for play, pause, scrub, speed controls, repeat.

This is possible because of MCP Apps (called “MCP bundles” in Claude’s implementation), a recent evolution of the MCP protocol that lets servers render UI alongside their tool responses. Before this, every MCP interaction was text-in, text-out. Now it can be text (or voice!) in, interface out.

Paul thinks this is key to taking MCP beyond enthusiast technology. “MCP apps open up this world to non-technical people,” he said. “You don’t really need to know what an MCP server is. All you need to know is that this makes Claude do things better.”

He envisions something like an app store for agent capabilities. Right now, if you want Claude to make phone calls through ElevenLabs, you need an ElevenLabs account, possibly a Twilio account, and the technical knowledge to wire them together. An MCP app could package all of that into a single app that a non-developer can install for 99 cents. “If you add money into the mix,” Paul said, “that’s where an ecosystem really thrives.”

Discovery remains a challenge, though. There’s little vetting process for what goes into skill registries like skills.sh, which means prompt injection is a real risk on marketplaces like OpenClaw’s. Versioning for skills is poorly documented, and if a skill becomes outdated or needs a security fix, there’s no mechanism to notify users. “If you don’t run npx skills update you’ll never know that your skill is outdated.” And competing standards remain an issue, with Anthropic, OpenAI, Cursor, and others all running variations on the same idea.

“I’m reminded of that classic XKCD strip,” Paul said. “There are 14 competing standards, so we make one that covers all the use cases. Soon there are 15 competing standards.”

The abstraction continues

Paul sees the shift from writing code to managing agents as the latest in a long line of abstractions from punch cards to keyboards to GUIs to natural language. “For a generation, we have been leaning forward, trying to communicate with the computer on its terms,” he said. “For the first time, the computers are leaning in the opposite direction.”

He wouldn’t be surprised if writing code becomes a review activity within a few years and the “software engineer” role morphs into one of “agent management.” It’s a claim that will make some engineers uncomfortable, and Paul knows it. “I think that’s only going to get worse/better, depending on your point of view.”

Whether or not you share that timeline, the practical implications for developer experience teams are already here. If agents are writing most of the integration code, then the quality of your agent-facing documentation, skills, MCP tools, error messages, all these matter matters at least as much as your human-facing videos and tutorials. Possibly more.

Try it yourself

The ElevenLabs MCP Player is available now as an extension in Claude Desktop. Search for “ElevenLabs” in settings under extensions. The source is open at github.com/elevenlabs/elevenlabs-mcp-player.

To install ElevenLabs’ agent skills for your coding agent, run npx @anthropic-ai/agent-skills install github:elevenlabs/skills. The skills repo is at github.com/elevenlabs/skills. If you want to evaluate skills before adopting them, Tessl’s registry provides quality scores across structure, implementation, and discoverability.

Follow ElevenLabs on Bluesky and X for updates, and find Paul on Bluesky and X.

MCP MVP is a video series from Arcade.dev with RL Nabors that spotlights the builders shaping the agentic ecosystem. Watch the full interview with Paul Asjes →

Want to give your agents authenticated access to APIs without managing tokens yourself? See how Arcade handles OAuth for MCP tools.