-
Notifications
You must be signed in to change notification settings - Fork 131
Description
WebMCP diverges strongly from "traditional" MCP because both the MCP Host (browser) and the MCP Server (website) control UI simultaneously. This means that either or both can involve the "Human In The Loop." User agent developers, LLM agents, and web application developers must make judgment calls regarding when to collect user consent before executing a tool. Currently, web application developers lack visibility into the consent already collected for a specific tool execution. I propose that the user agent informs the web application of the type of consent, if any, that was collected for a specific tool execution.
Originally I wanted to propose a simple boolean 'was user consent collected' on each tool execution, but I think a bit more granularity is required:
userConsent |
Definition |
|---|---|
Explicit |
The user specifically clicked "Allow" for this exact execution and parameters. |
PreAuthorized |
Matches a "Remember my choice" or "Always allow" rule for this specific tool. (Maybe not necessary and this could be lumped into PolicyDriven below) |
PolicyDriven |
The user has configured an "Autonomous" or "Guided" mode in the User Agent, granting broad permission for certain classes of actions. (see #53 ) |
None |
No consent was collected. |
Related: It might make sense for Web Application developers to inform the User Agent that they take full responsibility for managing consents. readonlyHint kind of gives this, however that would be abusing the specification a bit 😅 (e.g. imagine a banking application with a "send money to" tool marking the tool as readonlyHint: true). On a similar note, one could imagine a tool property that would tell User Agents that Explicit consent is always required.
Note
Hypothetically, User Agent developers could decide not to collect consent at all, leaving it entirely to web applications. The advantage of this approach is that web application developers have the best context to provide relevant information about what the tool call will do. For example, while a user agent might only display "Delete calendar event with ID 123456789," the web application can display the event itself. The disadvantage is a potentially inconsistent UX. Furthermore, if even one User Agent does collect consent (perhaps due to using a less reliable local model), the user will risk facing redundant "double consent" prompts. (Plus I really want this specification to improve the web for visually impaired users, and 'messy inconsistent consent flows' would be a super high point of friction)
Maybe one could have the best of both worlds if there was another optional callback on Tools that would take tool parameter inputs, and return a human readable string.