Keep file bytes out of your AI agent's context window

How do you move a file off someone’s laptop and into a remote MCP server without routing the bytes through the AI agent’s context window?

Ask an AI agent to email a PDF to your accountant and it handles the hard parts: it finds the recipient, writes the email’s body, and calls an MCP tool to send it. But it stalls on the step that should be trivial, attaching the PDF that is sitting on your laptop. The obvious way to attach it is to let the model read the file and write it into the tool call itself, and that is exactly where it falls apart.

The 1.5-million-token attachment

The default everyone reaches for is base64, the file’s bytes encoded as text and inlined straight into a tool argument. It seems to work for very small files, but quickly becomes context poison for any real-world use-case.

Base64 inflates the payload by about a third, and every one of those characters becomes tokens the model has to emit and pay attention to. A one-megabyte file is around 1.5 million tokens before the model has written a single word of the actual email. Most models will not even get there. And even if they could, swapping even a fraction of the characters will corrupt the file.

The bottleneck is the context window, and the deeper issue is that the bytes are in the model’s context at all. The model is being asked to act as a clipboard for data it will never read, reason about, or change.

The model only needs the name

When you say “send this to my accountant,” the model’s job is to figure out the recipient, the subject, the body, and which file you mean. The bytes of that file are not part of any decision the model makes. The model needs the file’s name. The bytes can be bound in later, by something that is not the model.

So that is what we did. The model emits a short reference, a file:// path, in the attachment field. A small hook on the user’s machine runs just before the tool call leaves the client, sees the file:// reference, reads the file locally, and swaps in the bytes. By the time the request crosses the wire, the attachment is populated. The model never saw the bytes. It only ever wrote a path.

// What the model emits
{ "to": "accountant@example.com",
  "subject": "Q2 earnings",
  "attachments": [{ "source": "file:///Users/me/q2-earnings.pdf" }] }

// What leaves the client, after the hook binds the bytes locally
{ "to": "accountant@example.com",
  "subject": "Q2 earnings",
  "attachments": [{ "source": "data:application/pdf;base64,JVBERi0xLj...",
                    "filename": "q2-earnings.pdf", "mimeType": "application/pdf" }] }

We shipped this with no changes required on the server, and it works today. Here is what it takes.

How the binding works

The piece doing the work is a hook. Several agent clients let you register hooks, small programs the client runs at fixed points in a tool call’s life. Ours runs at one point: the moment after the model has finished a tool call and before the client sends it. The model writes the call, the hook receives it as JSON, and the hook is the last thing to touch it before it leaves your laptop.

When it sees a file:// reference in an attachment’s source, it reads that file with your own permissions, base64-encodes it, and rewrites source into the inline data: value. It fills in the filename and MIME type from the file if the model left them blank, and it touches nothing else: a file:// anywhere outside the attachments list is passed through as written. The client then serializes the populated call and sends it, and the server receives an ordinary attachment.

Two properties make this work where inlining base64 fails. It runs after the model is finished, so the bytes are never tokens the model has to generate or hold. And it runs on your machine, so the bytes go straight into the outbound request instead of detouring through the context window to get there. File size stops being the enemy, because the thing that made size hurt was tokenization, and there is no tokenization left in the path.

Hosts wire up pre-call hooks differently. Cursor expects a flat response; Claude Code, Codex, and VS Code’s agent mode expect a nested one. So the substitution is a single script, and each host’s config tells it which shape to emit. It runs with no extra privileges, makes no network calls of its own, and rejects a file:// smuggled into a filename or MIME field. It is deliberately boring, because the less it does, the less there is to trust.

The harder question is how a script like that gets onto your machine in the first place, on whatever agent you happen to use, without you turning into a system administrator.

Setting it up

You do not install the hook by hand. The first time you attach a file, the tool notices the hook is not there yet and, instead of sending, returns a short setup response. Your agent reads it and offers to set everything up for you.

That response carries the hook’s source inline, so there is nothing to fetch from an outside server and nothing for a security team to chase down to an unfamiliar domain. With your go-ahead, the agent writes the script to a file and adds a few lines to your client’s settings to point at it. The setup is one-time and local: the hook lives on your machine, touches only the file you attach, and is there for you to read before you approve. Taking it back out is deleting those few lines.

Installing it changes your client’s own settings, so it runs through the same approval your client asks for on any other configuration change. The agent walks you through that step instead of trying to skip it.

All of this needs an agent that supports these pre-call hooks and can reach your files, which today means Cursor, Claude Code, Codex, and VS Code’s agent mode. On a client with no such hook, or with no local filesystem to read from, the swap cannot happen, and the tool says so plainly and points you to one that can.

The idea behind the hook

That is the shipped solution, and it works today. What we built is a hook: Arcade reaching into each client, one pre-call substitution at a time. The hook is one implementation of an idea that is older and larger than email or attachments, and the protocol these agents speak is only beginning to formalize it. The rest of this piece steps off our tools and onto that idea, and onto getting it right while it is still being written down.

You have seen a version of this before. The model emits a reference, something else binds the real value, and the value never has to pass through the model to do its job. Careful systems already lean on it, in two flavors worth telling apart.

Secrets are the strict flavor. The model writes ${API_KEY}, and the platform swaps in the real key after the model has responded, on the way out to the API. The model never holds the key and cannot read it if it tries, so a prompt-injected model cannot leak what it never had. Privacy vaults do the same for personal data, standing in tokens before anything reaches the model and restoring the real values on the way out. The defining property is force: the real value is kept away from the model whether the model wants it or not.

Files are the loose flavor. A model with filesystem access usually could open the attached file; nothing stops it, and it might even read a few lines to confirm it has the right one. What makes a file easy is something humbler than secrecy: the model has no reason to pull a megabyte of bytes into itself to attach a file, and pulling them in is what breaks. The bytes stay out of its context here because they are unnecessary to the task. The model only needs the name.

It helps to separate two goals that get conflated constantly. One is keeping bytes out of the model’s context window, so they never become tokens. The other is keeping bytes off the wire, so a large file never gets serialized into the protocol message at all. They are different problems with different answers, and a good design lets you choose. Our hook nails the first with zero server work. Truly enormous files want the second, which means moving bytes out of band. Both can be true at once.

MCP is starting to formalize this, and there is an assumption to watch

The same idea is beginning to surface in MCP itself, the protocol most of these agents use to talk to tools. There is an early effort to make file inputs first-class, and it gets the core right: the model leaves the file slot empty, and the client fills it before the request goes out, so the bytes stay out of the context window. It is the shape from earlier, on its way into a standard.

It is early, though. Changes to MCP move through a process it calls SEPs, or Standards Enhancement Proposals, and this one is still an open draft with nothing adopted. MCP ships only about twice a year and it will not make the next release, so the capability is realistically many months out. That distance is the reason to raise the gap now rather than later: the design is still malleable, the cost of changing it only climbs from here, and almost no one outside the handful drafting it has read it.

The draft already does more than a raw picker. The host can fill the slot from a file you already attached to the conversation, from a picker it renders, or from a short list of candidates it puts in front of the model to choose from, and in every case the bytes stay out of the model’s context. The one thing it will not do is let the client resolve a file the model named on its own. A host must not hand a model-supplied value to anything that resolves a path. That is a sensible, safe default while a person is in the loop watching each step.

The trouble starts when the file is not one the host can surface for you to pick, or when nobody is there to pick at all. The draft is candid about the second case: a host that cannot show a picker, a CLI with no interactive terminal, a CI pipeline, should prompt for a path or decline. For the fastest-growing way people run agents, that breaks down in two places.

The first is reach. Choosing from a list the host assembles is narrower than finding the file. A coding agent locates a source file from a fragment of a description without thinking about it, and “send my Q2 earnings to my accountant,” with no drag and drop at all, is a request a capable agent can satisfy by resolving the file itself. A list the host built in advance cannot cover that.

The second is that all of it assumes a human is present. Set an agent loose on an eight-hour job overnight, go to sleep, and come back to find it stopped five minutes in behind a file-picker dialog that no one was there to answer. Headless agents, background agents, scheduled runs: none of them have a human to pick, and the draft’s own fallback for them is to prompt or give up. A file-input design whose only autonomous answer is to give up cannot serve the part of the ecosystem that is growing fastest.

Let the client find the file, safely

The fix is one more fill mode. Alongside the modes that need a person, add one that does not: the client resolves. The agent names the file, and the client resolves and binds the bytes at tool-execution time from data it can already reach. No picker, no human required, and the bytes still never touch the model’s context.

The obvious objection is the reason the picker rule exists. If the client will read whatever path the model names, a hijacked model can name ~/.ssh/id_rsa, and with no human in the loop there is nothing to catch the send. The answer is to enforce scope at the client instead of trusting the model. The agent may name files only inside a boundary the user authorized up front: the workspace, a declared allowlist, the files already in the conversation. The client resolves only inside that fence and refuses anything outside it. Exfiltration is bounded to what the user already chose to expose, and that guarantee holds whether or not anyone is watching. This is the direction MCP is heading on its own: access control is moving out of the protocol and into the client, where a boundary like this is enforced rather than merely advertised.

Sensitive work keeps stricter modes. A server handling something it does not trust the client to touch can still demand a picker or an out-of-band upload. Client-resolution is the mode for the common, scoped, autonomous case, which today has no good answer at all.

The bigger prize

Files are step one, and the loosest one, since the model never needed the bytes in the first place. The primitive underneath is stricter and more interesting: declare an input the model must not see, let only the client resolve it from an authorized scope, and you get a real guarantee the model cannot read it, even with no human in the loop. That guarantee is what turns the loose file case into something stronger, and once you can write it down it stops being about attachments.

Health records that an agent has to send but must never put in a transcript. Customer PII an agent routes between systems. Credentials. Today each of those is solved by a different bolt-on, secret injection in one place, a privacy vault in another, a clever harness somewhere else. As an open protocol capability it becomes one idea: a model-invisible, client-resolved input, scoped for safety, that works while you sleep.

That is the conversation we want to have with the MCP community. The file work in flight is most of the way there. The missing piece is letting the client resolve the input itself, safely, so the next generation of agents does not hit a file picker at 2am.

Try it

The Gmail attachments capability is live in Arcade.dev’s tools today. Point your agent at it, attach a local file by reference, and watch the bytes never enter the conversation. If you are wrestling with the same problem from the cloud side, our earlier piece on why MCP gets stuck on your laptop is a good companion read.