-
Notifications
You must be signed in to change notification settings - Fork 131
Description
Summary
WebMCP should include an API that lets web pages request LLM completions from the visiting agent's model. This enables agentic interactions on any website without requiring backend AI infrastructure or token costs.
What is Sampling?
In the Model Context Protocol, "sampling" refers to a mechanism where a tool provider (in our case, a web page) can ask the agent's LLM to generate a completion on its behalf. It's a reverse call — instead of the agent calling a tool and getting structured data back, the tool asks the agent to think about something and return its reasoning.
Concretely: a web page provides context (a product catalog, form data, an error log) and a prompt ("rank these for the user", "validate this input", "summarize this issue"), and the agent's model produces a completion. The page gets AI capabilities without running its own model.
In MCP, this is implemented as sampling/createMessage — a JSON-RPC request from server to client. The server sends a messages array, an optional system prompt, and constraints (like maxTokens), and the client returns a model completion.
A note on naming: "Sampling" is MCP's term, inherited from ML terminology (sampling tokens from a probability distribution). It's admittedly opaque. This proposal uses the MCP term for consistency, but the WebMCP API surface could use a more descriptive name — the concept is closer to "delegated completion" or "reverse completion." The proposed API method is
createMessage, matching MCP's naming.
Context
WebMCP currently implements only the tools primitive from MCP, intentionally omitting resources, prompts, and sampling. This proposal argues that sampling deserves inclusion in the standard — it solves a distinct problem that tools alone cannot address.
Notably, the MCP-B project (@mcp-b/global) — an unofficial community polyfill unaffiliated with the W3C effort — has shipped sampling support since December 2025 (PR #16, PR #98). While MCP-B is not an official WebMCP implementation, its adoption of sampling serves as prior art demonstrating real-world demand for this capability in browser contexts.
Relationship to the Prompt API
Chrome's Prompt API (LanguageModel.create() / session.prompt()) also lets web pages request LLM completions. The overlap is real, but the two serve fundamentally different roles:
| Prompt API | WebMCP Sampling | |
|---|---|---|
| Model | Gemini Nano, on-device, bundled in Chrome | The visiting agent's model (Claude, GPT, Gemini Pro, etc.) |
| Capability | Small model — classification, extraction, simple Q&A | Frontier models — complex reasoning, code generation, large context |
| Cost | Free — local inference, no tokens consumed | Uses the user's agent tokens/quota |
| Privacy | All data stays on-device | Data flows to the agent's cloud model |
| Agentic context | Standalone — fresh session, page provides all context | Part of an active agent session — the client may incorporate user intent and cross-tool context at its discretion |
| Permission | None needed — local, free | Permission required — spending the user's resources |
| Cross-browser | Chrome-only | W3C standard track, cross-browser by design |
| Hardware | Requires 22GB+ free disk, 4GB+ VRAM or 16GB RAM | No hardware requirements — inference is remote |
They're complementary, not competing:
- Prompt API → lightweight local tasks the page handles independently (classify text, extract fields, summarize a paragraph)
- Sampling → the page needs the agent to reason about data using a frontier model. The client may optionally incorporate conversational context, but the spec leaves this to the implementation
Example: a store with a product catalog could use the Prompt API to classify products locally. But to ask "given this customer's browsing session and these products, which is the best fit?" — that requires a frontier model's reasoning. And if the client chooses to include the agent's conversational context, the results get even richer — but sampling is valuable either way.
Motivation
1. Lowers the barrier to entry for the agentic web
Not every website has backend AI infrastructure. Most sites — small businesses, blogs, local services, community forums — cannot afford LLM API keys, inference costs, or the engineering effort to integrate AI backends.
Sampling flips the economics: the user's agent provides the model. The website provides context (product catalogs, session state, page content) and the agent does the reasoning. This means any site can offer agentic experiences with zero AI backend investment.
2. The browser trust model is a natural fit
In standard MCP, sampling is a harder sell: an opaque remote server asks the client to spend tokens. The trust relationship is indirect. In WebMCP, sampling fits naturally:
- The "server" is a page the user actively navigated to — there's already implicit trust
- The browser can gate sampling behind a permission prompt (like camera, microphone, or notifications)
- Permissions can be scoped per origin with user-controlled policies
- The browser already has robust models for resource-access consent — sampling slots right in
This is arguably safer than MCP sampling because the trust boundary is more visible and user-controlled.
3. Pages become collaborators, not just tool bags
Without sampling, WebMCP pages are passive — they register tools and wait. The interaction is one-directional: agent → page → result. With sampling, the page can delegate reasoning back to the agent mid-workflow:
- "I have this form data — validate it before submission"
- "Here's a product catalog and user preferences — rank these"
- "This error log needs interpretation — summarize the issue"
The page has rich client-side context (DOM, session, cookies, app state). The agent has reasoning capabilities and model access. Sampling bridges these two contexts, enabling true collaboration rather than simple tool invocation. (Note: in MCP's sampling spec, the server constructs the message payload explicitly — the parent conversation history is not automatically included. The client may inject additional context at its discretion, but this is not guaranteed.)
4. Prior art exists in the community
The MCP-B community polyfill (not an official WebMCP implementation) has shipped sampling support since December 2025, including:
navigator.modelContext.setSamplingHandler()/clearSamplingHandler()- React hooks for sampling integration
- Capability negotiation for sampling support
While MCP-B is an independent project, its design choices validate that sampling is a natural extension of the navigator.modelContext surface. The API patterns it has explored can inform — though not dictate — standardization.
Proposed API Shape
Sampling in WebMCP could follow the existing permission model:
// Collect context from the page's application state
const cart = getCartContents();
const shippingAddress = getShippingAddress();
// Ask the agent's model to validate before submission
const response = await navigator.modelContext.createMessage({
systemPrompt: `
Flag potential issues: mismatched currencies,
missing fields, unusual quantities. Be concise.
`,
messages: [
{
role: "user",
content: {
type: "text",
text: `
Review this order for issues before I submit it:
Cart: ${JSON.stringify(cart)}
Ship to: ${JSON.stringify(shippingAddress)}
`
}
}
],
maxTokens: 300,
});The browser mediates this request:
- Checks if the origin has sampling permission (prompts user if not)
- Forwards to the connected agent's model
- Returns the completion to the page
Permission Model
example.com wants to request AI completions from your agent.
[Allow Once] [Allow for this site] [Block]
This mirrors the existing browser permission UX for camera, location, and notifications.
Security Considerations
- Token cost: Sampling consumes the user's agent tokens. The permission prompt must make this clear — "This site wants to use your AI agent to process requests."
- Prompt injection: The page controls the sampling prompt, which could be adversarial. Agents should treat sampled content with the same caution as tool results — as untrusted input.
- Unsolicited sampling / "shadow agent" risk: Nothing in the MCP spec restricts sampling to within a tool call cycle — a page could fire sampling requests unprompted, effectively running its own reasoning loop on the user's agent. This is a real concern: a page with sampling permission could burn the user's tokens for its own purposes while the user and agent are idle. Browsers should enforce per-origin rate limits, require user-visible activity indicators (similar to the camera/mic recording dot), and potentially restrict sampling to contexts where the agent has actively engaged with the page.
- Rate limiting: Beyond the shadow agent concern, browsers should enforce per-origin rate limits on sampling requests to prevent abuse more broadly.
- Data exfiltration: Sampling results flow back to the page. If the agent's context contains sensitive information from other tabs/sources, the system prompt and context isolation must prevent leakage. This is related to the existing "lethal trifecta" discussion (#11).
Open Questions
Relationship to the Prompt API: Chrome's Prompt API serves a similar shape — pages requesting LLM completions — but against a local on-device model rather than the visiting agent's model. Two possible unification directions exist:
-
Prompt API gains a
local/remoteparameter — a single API where the page specifies the model source. This is clean but papers over real differences: local is free/no-permission, agent-routed costs tokens and needs consent. Different failure modes, different trust models. -
WebMCP's
createMessagesubsumes the Prompt API —navigator.modelContext.createMessage()becomes the universal completion interface, falling back to a local model when no agent is connected. This benefits adoption but is a much larger scope.
Either direction could work. A key consideration is the cross-browser story: the Prompt API is currently Chrome-only, tied to Gemini Nano. Keeping sampling as a separate WebMCP primitive means it stands on its own as a cross-browser standard without coupling to any browser's local model strategy. Unification could help adoption but might complicate cross-browser consensus if browsers disagree on local model support.
This proposal intentionally keeps sampling scoped to WebMCP. Whether unification is the right endgame is a question for the working group.