Proposal: Add Sampling Support to WebMCP

## Summary

WebMCP should include an API that lets web pages request LLM completions from the visiting agent's model. This enables agentic interactions on any website without requiring backend AI infrastructure or token costs.

## What is Sampling?

In the [Model Context Protocol](https://modelcontextprotocol.io/docs/concepts/sampling), "sampling" refers to a mechanism where a **tool provider** (in our case, a web page) can ask the **agent's LLM** to generate a completion on its behalf. It's a reverse call — instead of the agent calling a tool and getting structured data back, the tool asks the agent to *think* about something and return its reasoning.

Concretely: a web page provides context (a product catalog, form data, an error log) and a prompt ("rank these for the user", "validate this input", "summarize this issue"), and the agent's model produces a completion. The page gets AI capabilities without running its own model.

In MCP, this is implemented as `sampling/createMessage` — a JSON-RPC request from server to client. The server sends a messages array, an optional system prompt, and constraints (like `maxTokens`), and the client returns a model completion.

> **A note on naming**: "Sampling" is MCP's term, inherited from ML terminology (sampling tokens from a probability distribution). It's admittedly opaque. This proposal uses the MCP term for consistency, but the WebMCP API surface could use a more descriptive name — the concept is closer to "delegated completion" or "reverse completion." The proposed API method is `createMessage`, matching MCP's naming.

## Context

WebMCP currently implements only the **tools** primitive from MCP, intentionally omitting resources, prompts, and sampling. This proposal argues that sampling deserves inclusion in the standard — it solves a distinct problem that tools alone cannot address.

Notably, the MCP-B project (`@mcp-b/global`) — an unofficial community polyfill unaffiliated with the W3C effort — has shipped sampling support since December 2025 ([PR #16](https://github.com/WebMCP-org/npm-packages/pull/16), [PR #98](https://github.com/WebMCP-org/npm-packages/pull/98)). While MCP-B is not an official WebMCP implementation, its adoption of sampling serves as prior art demonstrating real-world demand for this capability in browser contexts.

## Relationship to the Prompt API

Chrome's [Prompt API](https://developer.chrome.com/docs/ai/prompt-api) (`LanguageModel.create()` / `session.prompt()`) also lets web pages request LLM completions. The overlap is real, but the two serve fundamentally different roles:

| | **Prompt API** | **WebMCP Sampling** |
|---|---|---|
| **Model** | Gemini Nano, on-device, bundled in Chrome | The visiting agent's model (Claude, GPT, Gemini Pro, etc.) |
| **Capability** | Small model — classification, extraction, simple Q&A | Frontier models — complex reasoning, code generation, large context |
| **Cost** | Free — local inference, no tokens consumed | Uses the user's agent tokens/quota |
| **Privacy** | All data stays on-device | Data flows to the agent's cloud model |
| **Agentic context** | Standalone — fresh session, page provides all context | Part of an active agent session — the client *may* incorporate user intent and cross-tool context at its discretion |
| **Permission** | None needed — local, free | Permission required — spending the user's resources |
| **Cross-browser** | Chrome-only | W3C standard track, cross-browser by design |
| **Hardware** | Requires 22GB+ free disk, 4GB+ VRAM or 16GB RAM | No hardware requirements — inference is remote |

**They're complementary, not competing:**

- **Prompt API** → lightweight local tasks the page handles independently (classify text, extract fields, summarize a paragraph)
- **Sampling** → the page needs the agent to *reason* about data using a frontier model. The client may optionally incorporate conversational context, but the spec leaves this to the implementation

Example: a store with a product catalog could use the Prompt API to classify products locally. But to ask "given this customer's browsing session and these products, which is the best fit?" — that requires a frontier model's reasoning. And if the client chooses to include the agent's conversational context, the results get even richer — but sampling is valuable either way.

## Motivation

### 1. Lowers the barrier to entry for the agentic web

Not every website has backend AI infrastructure. Most sites — small businesses, blogs, local services, community forums — cannot afford LLM API keys, inference costs, or the engineering effort to integrate AI backends.

Sampling flips the economics: the **user's agent** provides the model. The website provides **context** (product catalogs, session state, page content) and the agent does the reasoning. This means any site can offer agentic experiences with zero AI backend investment.

### 2. The browser trust model is a natural fit

In standard MCP, sampling is a harder sell: an opaque remote server asks the client to spend tokens. The trust relationship is indirect. In WebMCP, sampling fits naturally:

- The "server" is a page the user **actively navigated to** — there's already implicit trust
- The browser can gate sampling behind a **permission prompt** (like camera, microphone, or notifications)
- Permissions can be **scoped per origin** with user-controlled policies
- The browser already has robust models for resource-access consent — sampling slots right in

This is arguably **safer** than MCP sampling because the trust boundary is more visible and user-controlled.

### 3. Pages become collaborators, not just tool bags

Without sampling, WebMCP pages are passive — they register tools and wait. The interaction is one-directional: agent → page → result. With sampling, the page can delegate reasoning back to the agent mid-workflow:

- "I have this form data — validate it before submission"
- "Here's a product catalog and user preferences — rank these"
- "This error log needs interpretation — summarize the issue"

The page has **rich client-side context** (DOM, session, cookies, app state). The agent has **reasoning capabilities and model access**. Sampling bridges these two contexts, enabling true collaboration rather than simple tool invocation. (Note: in MCP's sampling spec, the server constructs the message payload explicitly — the parent conversation history is not automatically included. The client *may* inject additional context at its discretion, but this is not guaranteed.)

### 4. Prior art exists in the community

The MCP-B community polyfill (not an official WebMCP implementation) has shipped sampling support since December 2025, including:
- `navigator.modelContext.setSamplingHandler()` / `clearSamplingHandler()`
- React hooks for sampling integration
- Capability negotiation for sampling support

While MCP-B is an independent project, its design choices validate that sampling is a natural extension of the `navigator.modelContext` surface. The API patterns it has explored can inform — though not dictate — standardization.

## Proposed API Shape

Sampling in WebMCP could follow the existing permission model:

```javascript
// Collect context from the page's application state
const cart = getCartContents();
const shippingAddress = getShippingAddress();

// Ask the agent's model to validate before submission
const response = await navigator.modelContext.createMessage({
  systemPrompt: `
    Flag potential issues: mismatched currencies,
    missing fields, unusual quantities. Be concise.
  `,
  messages: [
    {
      role: "user",
      content: {
        type: "text",
        text: `
          Review this order for issues before I submit it:
          Cart: ${JSON.stringify(cart)}
          Ship to: ${JSON.stringify(shippingAddress)}
        `
      }
    }
  ],
  maxTokens: 300,
});
```

The browser mediates this request:
1. Checks if the origin has sampling permission (prompts user if not)
2. Forwards to the connected agent's model
3. Returns the completion to the page

### Permission Model

```
example.com wants to request AI completions from your agent.
[Allow Once] [Allow for this site] [Block]
```

This mirrors the existing browser permission UX for camera, location, and notifications.

## Security Considerations

- **Token cost**: Sampling consumes the user's agent tokens. The permission prompt must make this clear — "This site wants to use your AI agent to process requests."
- **Prompt injection**: The page controls the sampling prompt, which could be adversarial. Agents should treat sampled content with the same caution as tool results — as untrusted input.
- **Unsolicited sampling / "shadow agent" risk**: Nothing in the MCP spec restricts sampling to within a tool call cycle — a page could fire sampling requests unprompted, effectively running its own reasoning loop on the user's agent. This is a real concern: a page with sampling permission could burn the user's tokens for its own purposes while the user and agent are idle. Browsers should enforce **per-origin rate limits**, require **user-visible activity indicators** (similar to the camera/mic recording dot), and potentially restrict sampling to contexts where the agent has actively engaged with the page.
- **Rate limiting**: Beyond the shadow agent concern, browsers should enforce per-origin rate limits on sampling requests to prevent abuse more broadly.
- **Data exfiltration**: Sampling results flow back to the page. If the agent's context contains sensitive information from other tabs/sources, the system prompt and context isolation must prevent leakage. This is related to the existing "lethal trifecta" discussion ([#11](https://github.com/webmachinelearning/webmcp/issues/11)).

## Open Questions

**Relationship to the Prompt API**: Chrome's [Prompt API](https://developer.chrome.com/docs/ai/prompt-api) serves a similar shape — pages requesting LLM completions — but against a local on-device model rather than the visiting agent's model. Two possible unification directions exist:

1. **Prompt API gains a `local/remote` parameter** — a single API where the page specifies the model source. This is clean but papers over real differences: local is free/no-permission, agent-routed costs tokens and needs consent. Different failure modes, different trust models.

2. **WebMCP's `createMessage` subsumes the Prompt API** — `navigator.modelContext.createMessage()` becomes the universal completion interface, falling back to a local model when no agent is connected. This benefits adoption but is a much larger scope.

Either direction could work. A key consideration is the **cross-browser story**: the Prompt API is currently Chrome-only, tied to Gemini Nano. Keeping sampling as a separate WebMCP primitive means it stands on its own as a cross-browser standard without coupling to any browser's local model strategy. Unification could help adoption but might complicate cross-browser consensus if browsers disagree on local model support.

This proposal intentionally keeps sampling scoped to WebMCP. Whether unification is the right endgame is a question for the working group.

## Related Issues

- [#84](https://github.com/webmachinelearning/webmcp/issues/84) — MCP-to-WebMCP converter asks "how to handle MCP-specific features not in WebMCP (resources, prompts, sampling)?"
- [#11](https://github.com/webmachinelearning/webmcp/issues/11) — Prompt injection tracking (sampling introduces a new vector)
- MCP-B polyfill: [PR #16](https://github.com/WebMCP-org/npm-packages/pull/16), [PR #98](https://github.com/WebMCP-org/npm-packages/pull/98) — existing sampling implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add Sampling Support to WebMCP #148

Summary

What is Sampling?

Context

Relationship to the Prompt API

Motivation

1. Lowers the barrier to entry for the agentic web

2. The browser trust model is a natural fit

3. Pages become collaborators, not just tool bags

4. Prior art exists in the community

Proposed API Shape

Permission Model

Security Considerations

Open Questions

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Prompt API	WebMCP Sampling
Model	Gemini Nano, on-device, bundled in Chrome	The visiting agent's model (Claude, GPT, Gemini Pro, etc.)
Capability	Small model — classification, extraction, simple Q&A	Frontier models — complex reasoning, code generation, large context
Cost	Free — local inference, no tokens consumed	Uses the user's agent tokens/quota
Privacy	All data stays on-device	Data flows to the agent's cloud model
Agentic context	Standalone — fresh session, page provides all context	Part of an active agent session — the client may incorporate user intent and cross-tool context at its discretion
Permission	None needed — local, free	Permission required — spending the user's resources
Cross-browser	Chrome-only	W3C standard track, cross-browser by design
Hardware	Requires 22GB+ free disk, 4GB+ VRAM or 16GB RAM	No hardware requirements — inference is remote

Proposal: Add Sampling Support to WebMCP #148

Description

Summary

What is Sampling?

Context

Relationship to the Prompt API

Motivation

1. Lowers the barrier to entry for the agentic web

2. The browser trust model is a natural fit

3. Pages become collaborators, not just tool bags

4. Prior art exists in the community

Proposed API Shape

Permission Model

Security Considerations

Open Questions

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions