Building Agents with HuggingFace MCP Server and Fast Agent

Shaun Smith built Fast Agent, the first agent framework designed from the ground up for MCP. He also built the Hugging Face MCP server, and in a recent interview, he showed us how to combine them by deploying a sub-agent as a remote MCP server.

This article is adapted from an interview with Shaun Smith, Open Source/MCP at Hugging Face.

Prior to the advent of MCP, agent frameworks were developing their own tool integration and adapter patterns. When MCP arrived, frameworks added it as a feature but didn’t retool their underlying systems around it. According to Shaun Smith, the problem with this bolt-on approach is that MCP is actually all you need.

Shaun works at Hugging Face on open source and MCP. He’s an active contributor to the MCP specification as part of the transports working group, reviewing contributions (like Arcade’s Elicitations contribution, aka SEP-1036!). He built Fast Agent, the first agent framework architected from the ground up around the Model Context Protocol, and the Hugging Face MCP server, which gives any MCP client dynamic access to the thousands of AI models hosted on the platform.

In a conversation with RL Nabors for Arcade’s MCP MVP series, Shaun demonstrated what an MCP-native workflow looks like, starting from a single prompt that generates a cat-themed news website, and ending with a sub-agent deployed as a remote MCP server in under ten minutes.

The Hugging Face MCP server: one tool, hundreds of models

When most developers think about Hugging Face, they think about model cards and research papers. But the Hub also hosts thousands of runnable AI applications through Spaces: image generators, video generators, OCR models, speech synthesis, translation, and more.

The Hugging Face MCP server makes all of that accessible through any MCP client. You configure which capabilities you want through your MCP settings, and the server exposes them as tools. There’s a feature called Dynamic Spaces that collapses dozens of specialized models into a single tool. This lets your agent make a single tool call, and the server determines which model to route to based on the task.

“If you want to generate images or do OCR or speech synthesis, you can embed all of that in your MCP server,” Shaun said. “And it [picks the right one] dynamically.”

This means you can right-size your AI instead of routing everything through a single, expensive frontier model. Your agent can delegate specialized tasks to purpose-built models that can often deliver better results faster at a lower price. Your local agent handles reasoning and orchestration; Hugging Face models handle image generation, audio processing, document analysis, or whatever else you need. As RL put it, “Frontier models are often overkill for what you need to do with them. It’s like the Walmart of models at your beck and call.”

To demonstrate, Shaun fired off a single prompt: get recent news headlines, generate a cat diorama image for the top three, and create a newspaper-style homepage to present them. Fast Agent searched the web, selected an image generator from Hugging Face, generated three cat-themed news illustrations in parallel, built a full HTML page, and published it. The result: “The Daily Paws,” a cat-filtered news site with headlines like “CES Tech: That’s the Cat’s Meow” and “Olympic Overture Gets Feline Finale.”

“We could change this to run every morning,” Shaun noted. RL was sold: “I would love to get my news cat-ified every morning.”

Why Fast Agent exists

Every agent framework promises tool integration. Fast Agent’s bet is that if you build around MCP from the start rather than bolting it on as an afterthought, everything else gets simpler.

Fast Agent is a Python framework with a declarative syntax that lets you define agents, wire them to MCP servers, and compose them into workflows using decorators. It supports the full MCP feature set: tools, resources, prompts, sampling, elicitations, and roots. Most other frameworks implement a subset; Fast Agent is end-to-end tested against the complete spec.

What makes it practically different from frameworks like LangChain or Mastra:

Multi-provider model support. Fast Agent ships with native integrations for Anthropic, OpenAI, Google, Azure, Ollama, DeepSeek, and dozens more through TensorZero. You can plan with Opus, develop with Codex, and search with an open-weight model, all within a single workflow, mixing and matching providers.
Full MCP client implementation. Fast Agent supports sampling, elicitations, resources, roots, and OAuth. Shaun noted that Fast Agent has “often been at the lead of MCP implementation” because he builds protocol features as they’re drafted, not months later.
Transport-layer visibility. Fast Agent exposes what’s happening at the transport level (HTTP connections, SSE streams, tool call routing) which most frameworks hide. When you’re debugging why a remote MCP server isn’t responding, seeing the transport layer matters.
Skills support. Fast Agent ships with a skill system for encoding reusable behaviors. One built-in skill, the “tool builder,” teaches the LLM to navigate Hugging Face’s OpenAPI spec and create custom CLI tools it can use for further automation.

Agents as tools: the underused MCP pattern

Next it was time to deploy the “news catifier” agent as an MCP tool.

MCP allows agents to be exposed as tools on a server. Although this has been in the MCP spec since 2024, the pattern hasn’t gotten much attention. Shaun walked through a concrete implementation.

First, he showed a sub-agent running locally. The sub-agent was a lightweight Hugging Face search agent backed by a fast open-weight model. It knew how to use the Hugging Face CLI to answer questions about trending models and datasets. The main agent delegated queries to it, keeping the sub-agent’s multi-turn context isolated from its own. This meant the orchestrating agent’s context stayed clean while the sub-agent did the looping and querying.

Shaun deployed that same sub-agent as a remote MCP server on Hugging Face Spaces. Once live, any MCP client could connect to the server and use the search agent with inference running on Hugging Face’s infrastructure, not the caller’s machine.

The whole process took under 10 minutes, from the local agent to the deployed remote MCP server.

Agents and sub-agents use LLMs. “Hugging Face has hundreds of LLMs available. So deploying it this way means people can use their Hugging Face account to use these sub-agent style tools,” Shaun explained.

Skills, code mode, and the security question

RL raised a pattern they’ve been tracking across the MCP MVP interview series: skills teach the agent how to reason about a task, code mode lets it write scripts to execute efficiently, but tools remain the most secure way to access external resources. You wouldn’t hand over your API keys to an LLM. MCP tools, especially when using OAuth via URL elicitation, maintain a layer between the model and sensitive operations.

Shaun pushed the thinking further. It’s not enough to secure individual tools. You must pay attention to commingling tools with different authorization scopes. If you give elevated privileges to one set of tools and use them alongside tools or skills with lower authorization, you’ve made the security problem worse. The agent might route privileged data through an unprivileged channel without anyone noticing.

“Every time I’m inviting something with an elevated privilege level in,” Shaun said, “I’m potentially commingling data that I may not want to commingle.”

RL offered an analogy: it’s like a hot gossip session with close friends, and someone invites a stranger into the room. The conversation doesn’t change, but the audience has. Shaun sharpened it: “Or someone left an open telephone on the table!”

This points toward a pattern the MCP community is actively working on: thinking about the agent environment as the unit of security, not individual tools. A task-specific bundle of models, tools, skills, and data with well-defined boundaries around what can flow where.

Transports matter more than you know

Shaun is part of the MCP Transports Working Group, so no conversation would be complete without discussing transports!

MCP currently supports two transports: standard IO for local servers (a process running on your machine) and HTTP for remote servers. As MCP tools move from local development machines to hosted services, HTTP transports with proper authentication become essential. OAuth support means the server knows who’s requesting, can customize responses per account, and can log what’s been accessed.

“The really important thing with the remote transport is we’ve got a good authentication story,” Shaun said. “We can log in with OAuth, we know from the MCP server who you are and what you’re requesting.”

The working group is focused on making remote transports scale more efficiently. RL connected it to a broader trend they’ve observed across the MCP MVP series: the shift from running MCP servers locally to consuming them as hosted services.

Try it yourself

Get started with Fast Agent:

uv pip install fast-agent-mcp
fast-agent setup

Documentation and examples at fast-agent.ai. Source code at github.com/evalstate/fast-agent.

To connect to the Hugging Face MCP server, visit your MCP settings to configure tools and Spaces, then add the server to your MCP client. For the community Spaces server, install via npm:

npx -y @llmindset/mcp-hfspace

Documentation at huggingface.co/docs/hub/en/agents-mcp. The server is open source.

Follow Shaun on GitHub as evalstate. His blog is at llmindset.co.uk.

MCP MVP is a video series from Arcade with RL Nabors that spotlights the builders shaping the agentic ecosystem. Watch the full interview with Shaun Smith a→