Tonight, as AI Engineer World’s Fair wraps up, Arcade.dev is sharing a stage with Cursor, Vapi, and Inngest at Inngest HQ for {AI} in Production. Four teams comparing notes on what actually breaks when agents leave the demo and hit production. I’m bringing a phone number.
Call it, say “the quickstart is missing the Node version requirement,” speak a four-digit PIN, and a GitHub issue lands on the repo moments later, authored under your own GitHub account. Not a bot. Not a shared service token. Yours.
I built it that narrow on purpose. Hand a room full of strangers a phone number and one shared identity, and the interesting question stops being “does the demo work” and becomes “what’s the worst thing someone in that room can make it do.” So for anyone calling in, the agent does exactly one thing: file an issue, attributed to them. As the host, saying “triage the backlog” unlocks the rest: an AI maintainer that judges the backlog, picks the best five, and launches Cursor Cloud Agents that write the code and open real pull requests, while a Slack channel narrates the run. Same agent, same prompt; the caller’s identity decides how much of it they can touch.
That split (one action for a stranger, the full workflow for whoever the system already trusts) is also the honest answer to what changes in production. You wouldn’t hand out PINs to a room; you’d resolve the caller against the entitlements your org already manages (your IdP, your existing RBAC) and let the same per-action check decide what they can do. This tutorial walks through how the demo is actually built, phone number and all, and where each piece would change if you pointed it at your own org instead of a stage.
Key takeaways
- A shared bot token can’t express “act as this caller.” Per-user OAuth via Arcade means two callers making the same request get different results. Each acts only as themselves, with only what they’ve connected.
- Voice agents need durable orchestration. A coding agent runs for minutes; a phone call can’t wait. Inngest lets the webhook return instantly while the workflow sleeps, polls, and retries in the background.
- Identity is the security boundary. Caller ID is spoofable, so unrecognized callers resolve to an anonymous user with zero grants. They can’t act as anyone.
- Governance is a layer, not a rewrite. Arcade’s Contextual Access hooks add RBAC, input policy, and output redaction on top of the tools with no changes to the agent.
- The whole stack is five services and one Cloudflare Worker: Vapi (voice), Inngest (orchestration), Cursor (code), Arcade (per-user auth + governance), Workers (hosting).
The problem: one agent, many humans, whose credentials?
Most “AI agent does things for you” demos run on a god-mode credential: one bot token that can post anywhere, one service account that owns every PR. That works until a second person uses it. Then every action is attributed to the bot, everyone has the same permissions, and one prompt injection has the blast radius of your entire workspace.
A voice interface makes this concrete fast. If anyone can dial the number, the agent’s permissions can’t come from the agent. They have to come from the caller. The agent’s reach should be the intersection of what the agent can do and what that specific caller is allowed to do.
That’s the contract Arcade enforces: every tool call carries a user_id, runs under that user’s own OAuth grant, and fails with an auth-required signal (not a silent fallback) when the grant doesn’t exist.
The stack
| Layer | Tool | Role |
|---|---|---|
| Voice | Vapi | Answers the call, turns speech into tool calls, speaks results back. |
| Durable orchestration | Inngest | Runs the multi-step workflow; sleeps and polls while agents work; retries each step. |
| Coding agent | Cursor Cloud Agents | Writes the code and pushes a branch. |
| Per-user authorization | Arcade | Runs every authenticated action (issues, comments, PRs, Gmail, Slack) as a specific user. |
| Hosting | Cloudflare Workers | Always-on public endpoint for webhooks and the web UI. |
The flow: caller → Vapi assistant → Worker webhook → identity resolution → Arcade tool call (fast path) or Inngest event (slow path) → Cursor agent → PR, email, Slack. Every write scoped to a human.
Everything below lives in one repo. It runs in mock mode with zero external accounts (npm install && cp .env.example .env && npm run dev), and each integration flips to live when its API key shows up in .env.
Prerequisites
To follow along in mock mode, you need Node 20+ and this repo cloned. Mock mode has no other dependencies. Every integration returns a canned response until you add its key.
To run any part of it live, add the matching account:
- An Arcade account and API key, for per-user auth and governance
- A Vapi account, for the voice interface (a phone number is optional; the
/callpage works without one) - A Cursor account with API access, for the coding agent
- An Inngest account, for the durable workflow (Inngest Cloud, or
npx inngest-cli devlocally) - A GitHub account and a repo you’re willing to point issues and PRs at
Step 1: Give the assistant tools, not freedom
The Vapi assistant (assistant/vapi-assistant.json) is a system prompt plus exactly three tools:
submit_issue: file a GitHub issue on the demo repo as the callertriage_issues(host-only): unleash the AI maintainer on the backlogbrief_me: read the caller’s own inbox and open PRs (per-user, read-only)
Every tool accepts a spoken access_code, the credential that maps a voice to a user:
{
"name": "submit_issue",
"description": "File a GitHub issue on the demo repo AS THE CALLER (under their own GitHub account, via their connected grant).",
"parameters": {
"type": "object",
"properties": {
"instruction": {
"type": "string",
"description": "The bug or feature in one or two sentences."
},
"access_code": {
"type": "string",
"description": "Optional. The caller's short access code (digits). Enrolled callers are identified by phone automatically; only ask for a code if a tool call says the caller couldn't be verified."
}
},
"required": ["instruction"]
}
}
Narrow tools with strong descriptions matter more in voice than anywhere else: there’s no screen to correct a misfire, so the model has to pick the right tool from a one-sentence utterance.
Step 2: Authenticate the webhook, then resolve the caller
Vapi POSTs tool calls to /api/vapi on the Worker. Two gates run before any identity logic: the shared webhook secret (so a forged POST never becomes a PIN-guessing oracle) and a rate limit.
const secret = config.vapi.serverSecret;
if (secret && c.req.header("x-vapi-secret") !== secret) {
return c.json({ error: "unauthorized" }, 401);
}
Then each tool call resolves the caller to an Arcade user_id. A static CALLER_MAP handles the host and pre-enrolled teammates. It maps phone numbers and short PINs to user ids:
{
"+15551234567": "alice@acme.com",
"4242": "alice@acme.com",
"1337": "bob@acme.com"
}
The resolution logic is small enough to read in full, and the last line is the security model:
export function identifyCaller(input: {
number?: string;
accessCode?: string;
}): CallerIdentity {
const map = config.callerMap;
const number = input.number?.trim();
// 1) Spoken access code, the credential you hand out (and rotate) for a demo.
const code = input.accessCode?.trim();
if (code && map[code]) {
const presenter = config.presenterNumber;
if (!presenter || !number || samePhone(number, presenter)) {
return { userId: map[code], method: "access_code", known: true };
}
}
// 2) Pre-enrolled phone number, convenience for known teammates only.
if (number && map[number]) {
return { userId: map[number], method: "phone", known: true };
}
// 3) Everyone else: no identity we trust -> no grants -> no actions.
return { userId: ANONYMOUS_USER_ID, method: "anonymous", known: false };
}
Caller ID is spoofable, so a phone number only resolves to a real person if it’s explicitly pre-enrolled. Anyone else becomes anonymous@voice-to-pr.demo, a user id that has authorized nothing, so every tool call for it trips Arcade’s auth-required gate. An unrecognized caller can never act as you. There is no fallback user.
One refinement for live demos: when PRESENTER_NUMBER is set, the static on-stage PIN is only honored from the presenter’s phone (or from web calls, which carry no caller ID). So a heckler who dials in and repeats the PIN they just heard gets nothing.
identifyCaller only reads the static CALLER_MAP, though, and that is deliberately half the story. The self-serve PINs from the next step live in Workers KV, not in the map, so a thin wrapper, resolveCaller (in src/callers.ts), tries the static map first and, only when that comes back anonymous, looks the spoken code up in KV to resolve the enrolled web-<uuid> user:
export async function resolveCaller(input) {
const stat = identifyCaller({
number: input.number,
accessCode: input.accessCode,
});
if (stat.method !== "anonymous") return stat; // static map hit (host / pre-enrolled)
const code = input.accessCode?.trim();
if (code) {
const userId = await input.store.getUser(code); // Workers KV: self-serve PIN -> web-<uuid>
if (userId) return { userId, method: "access_code", known: true };
}
return stat; // still anonymous
}
Static map first, KV second, anonymous if neither matches. So a /connect PIN maps to its own web-<uuid> even though it never appears in CALLER_MAP, and the security model from identifyCaller still holds: a code in neither place resolves to no one. In mock mode (no Arcade API key) the demo falls back to the configured host user so the zero-config quickstart still works end to end; in live mode an unmatched caller stays anonymous.
Step 3: Let strangers enroll themselves
The multi-user beat needs strangers, and strangers need grants. The /connect page mints a fresh, isolated user id and a short PIN, stored in Workers KV with a 6-hour TTL:
export async function enrollSelf(store: CallerStore): Promise<Enrollment> {
const userId = "web-" + crypto.randomUUID();
const pin = await uniquePin(store);
await store.putCode(pin, userId, CODE_TTL);
const links = await connectLinks(userId); // Arcade OAuth links, e.g. GitHub
return { pin, userId, links };
}
Two properties do the security work. Every signup gets a brand-new web-<uuid> (never an email, never a pre-enrolled identity), so typing someone else’s address can’t hijack their grants. And the only way to attach a GitHub account to that id is to complete GitHub’s OAuth yourself, through Arcade. The PIN is just a pointer; the OAuth grant is the credential.
Step 4: File the issue as the caller
With an identity in hand, the tool handler is almost boring. That’s the point. There’s no token handling in the app code; Arcade executes Github.CreateIssue under whatever grant userId has:
const issue = await createGitHubIssue({
repoUrl,
title,
body,
userId: identity.userId,
});
The interesting part is the failure mode. When the caller hasn’t connected GitHub yet, Arcade doesn’t fall back to a shared credential. It raises auth-required, and the assistant turns that into a spoken interrupt:
catch (err) {
if (err instanceof ArcadeAuthRequiredError) {
return {
toolCallId: call.id,
result: "You'll need to connect your GitHub first. Tap Connect on the sign-up page, then call me back.",
};
}
}
Run the same request as two different callers and you get the demo’s whole thesis: one files an issue under their own GitHub login; the other is told exactly what to connect. Same agent, same prompt, different humans, different capabilities.
Step 5: A durable AI maintainer, because agents are slow
triage_issues is where all four services chain together, and it’s host-gated in one line: the resolved identity must be the maintainer’s user id. The handler enqueues an Inngest event and returns immediately. A phone call cannot wait twenty minutes for Cursor agents to finish.
Roasting is not part of triage at all. It happens live. Because every issue arrives through your own submit_issue handler, the handler fires a per-issue Inngest event as it files, and a small workflow roasts it (an Arcade LLM call plus Github.CreateIssueComment as the maintainer) within seconds. No polling, no GitHub webhook: you already own the moment of creation.
Triage is the build pass. The workflow lists open issues, has the LLM judge the backlog for build-worthiness (clearly scoped, high-impact work first; vague asks and joke issues get skipped, and it falls back to the most recent if the judge flakes), then fans the picks out to a build workflow:
const picks = await step.run("pick-best", () => pickBestIssues(issues, 5));
const built = await Promise.all(
picks.map((iss) =>
step.invoke(`build-${iss.number}`, {
function: buildIssueWorkflow, // Cursor agent codes it, Arcade opens the PR
data: {
repoUrl,
issueNumber: iss.number,
issueTitle: iss.title /* ... */,
},
}),
),
);
Each step.run is durable: it records its result, retries independently, and survives restarts. While a Cursor agent works, the build workflow polls with step.sleep between checks, which costs nothing on Workers because the sleep is Inngest’s, not your process’s. One issue failing to build never takes down the run, and the final step posts a recap to Slack: how many considered, how many shipped, with PR links.
Division of labor worth stating plainly: Cursor pushes branches with a service credential; Arcade performs the human-attributed writes, filing the issue as the caller and the comments and PRs as the maintainer. Code generation and authenticated action are different problems, and they compose.
Step 6: Govern it with Contextual Access
Per-user OAuth answers “can this user?” Arcade’s Contextual Access (CATE) hooks answer “should they, in this context, and what comes back?” These are three webhooks the same Worker serves, with no changes to the tools or the agent.
Access (RBAC): read-only users can’t even see write tools. The deny list is a set of toolkit.tool pairs. When you add a write tool to the app, it must be added here too (we caught exactly that in code review):
const WRITE_TOOLS = new Set([
"slack.sendmessage",
"github.createpullrequest",
"github.createissue",
"github.createissuecomment",
"gmail.sendemail",
]);
Pre-execution (input policy): PRs only on allow-listed GitHub orgs, email only within your domain. Anything else is blocked with a reason the assistant can read back over the phone:
if (id === "github.createpullrequest") {
const allowed = config.cate.allowedGithubOwners.map((o) => o.toLowerCase());
if (allowed.length && !allowed.includes(owner)) {
return {
code: "CHECK_FAILED",
error_message: `Policy: pull requests are only allowed on ${config.cate.allowedGithubOwners.join(", ")} repos.`,
};
}
}
A second rule is live in the deployed demo: any GitHub issue or pull request whose title or body touches the license is denied outright. Ask the agent to change the LICENSE and Arcade’s pre-hook blocks the tool call before it runs; the assistant reads the reason back over the phone.
if (id === "github.createissue" || id === "github.createpullrequest") {
const text = `${inputs.title ?? ""} ${inputs.body ?? ""}`.toLowerCase();
if (/licen[sc]e|licens(?:ing|or)/i.test(text)) {
return {
code: "CHECK_FAILED",
error_message:
"Policy: changes involving the license are off-limits. Not even the maintainer can touch it.",
};
}
}
Note that even the host hits this wall. The check runs on the tool call, not on who is asking, so the maintainer’s own privileges do not override it (“changes involving the license are off-limits. Not even the maintainer can touch it”).
Post-execution (redaction): one-time codes, API keys, GitHub tokens, and SSNs are scrubbed from tool output before it reaches the agent. In a voice app this is not theoretical. Anything the model sees, it might literally read aloud.
These are not illustrative. CATE is registered as an active webhook plugin on the Arcade project (in the dashboard or the admin API), so it fires on every tool call the agent makes. Set CATE_HOOK_TOKEN so only Arcade can invoke the hooks.
Step 7: Deploy and get a phone number
The whole thing is one Worker:
npx wrangler secret put ARCADE_API_KEY # + ARCADE_USER_ID, CURSOR_API_KEY, CALLER_MAP,
# VAPI_PRIVATE_KEY, VAPI_PUBLIC_KEY, VAPI_SERVER_SECRET,
# INNGEST_EVENT_KEY, INNGEST_SIGNING_KEY
npm run deploy
curl -X PUT https://voice-to-pr.your-team.workers.dev/api/inngest # sync with Inngest Cloud
npm run create-assistant # upsert the Vapi assistant, prints its id
Then provision a number and attach the assistant. create-assistant prints the assistant id; save it as VAPI_ASSISTANT_ID in your environment and:
curl -X POST https://api.vapi.ai/phone-number \
-H "Authorization: Bearer $VAPI_PRIVATE_KEY" -H "Content-Type: application/json" \
-d '{"provider":"vapi","assistantId":"'"$VAPI_ASSISTANT_ID"'","numberDesiredAreaCode":"580"}'
Browser demos work too: /call is a click-to-talk page and /connect is enrollment, no phone required.
Production checklist
Before opening this up to real callers and a real repo:
- Webhook secret set (
VAPI_SERVER_SECRET). Without it,/api/vapiis an open PIN oracle. - No fallback identity in live mode. Unrecognized callers must resolve to a grantless anonymous user, never to you.
- PIN hygiene.
npm run rotate-pinrevokes your old on-stage PIN, mints a new one, and pushes the updatedCALLER_MAPin one atomic step; self-serve PINs expire on their own. - CATE hooks registered and token-protected. RBAC, org/domain allow-lists, and redaction are your blast-radius controls.
- Protected main branch. The maintainer opens PRs; nothing it does should be able to land without review.
Try it
- Clone the repo: github.com/arcadeai-labs/voice-to-pr-vapi-inngest-cursor-arcade
- Run the mock-mode quickstart: the full pipeline, zero external accounts.
- Add an
ARCADE_API_KEYfrom arcade.dev and watch the same request start acting as you.
The pattern generalizes past phones: any surface where multiple humans share one agent (Slack, WhatsApp, a kiosk, a support line) has the same shape. Identify the human, resolve their grants, act as them or interrupt for auth, and govern every call inline.
FAQ
Why not just use one GitHub token with repo scope? Attribution and blast radius. With a shared token every issue and PR is authored by a bot, every caller has identical permissions, and a single prompt injection can act with the token’s full power. Per-user grants make the agent’s reach per-human and the audit trail honest.
What happens when a caller hasn’t connected GitHub? Arcade raises an auth-required error instead of falling back. The assistant tells the caller what to connect and where, during the call. Nothing executes as anyone else.
Isn’t a spoken PIN weak authentication?
For a demo, it’s the right trade, and it’s layered: webhook secret, rate limiting, presenter-number gating for the static PIN, short-TTL self-serve PINs, and rotation. For production, the identity layer is one function: swap CALLER_MAP for a verified identity (signed JWT / SSO) and nothing else changes.
Why Inngest instead of just awaiting the Cursor API? A Cursor agent can run for many minutes; webhooks and Workers can’t (and shouldn’t) hold a connection that long. Durable steps mean the poll loop survives restarts, each side effect retries independently, and the phone call gets an instant answer.
Do Cursor and Arcade overlap? No, they compose. Cursor writes code and pushes branches with a service credential. Arcade performs the authenticated, human-attributed actions: the issue as the caller, the PR and comments as the maintainer, email and Slack as their respective users.
How much of this is demo-ware? The identity model (anonymous-by-default, per-user grants, auth interrupts) and the CATE governance layer are exactly what you’d run in production. The parts you’d replace are the PIN map (use SSO) and the roast comments (your call).
Built with Vapi, Inngest, Cursor, and Arcade. Get an API key at arcade.dev and start building.


