Skip to content

fix: add allowlist for from_serializable_dict type instantiation#1856

Open
jannahopp wants to merge 1 commit intounclecode:mainfrom
jannahopp:fix/deserialize-allowlist
Open

fix: add allowlist for from_serializable_dict type instantiation#1856
jannahopp wants to merge 1 commit intounclecode:mainfrom
jannahopp:fix/deserialize-allowlist

Conversation

@jannahopp
Copy link

@jannahopp jannahopp commented Mar 23, 2026

Why

from_serializable_dict instantiates any class exported by crawl4ai (~100+ classes) based on a user-supplied "type" string in JSON. When crawl4ai runs as an MCP server, this input comes from untrusted MCP clients via the /crawl endpoint — an attacker can trigger instantiation of classes never intended for deserialization (crawlers, dispatchers, Docker clients, etc.).

Rather than guessing which classes are safe, this fix lets operators declare exactly which types their deployment needs via an environment variable. Empty or unset = deny all (default-deny).

Changes

  • Adds CRAWL4AI_DESERIALIZE_ALLOW env var: comma-separated list of permitted class names
  • Empty/unset = deny all typed deserialization
  • Wraps .load() call sites in server.py and api.py to return HTTP 400 (instead of 500) for disallowed types
  • Adds tests for allowlist gate logic and env var parsing

Test plan

  • Existing tests pass
  • New TestDeserializationAllowlist tests pass (8 tests)
  • Docker: disallowed type in /crawl returns HTTP 400 with clear error message
  • Docker: /crawl with default (empty) config works normally
  • Docker: /md, /screenshot endpoints unaffected

🤖 Generated with Claude Code

from_serializable_dict resolves any class name exported by crawl4ai
from a user-supplied "type" string, enabling arbitrary class
instantiation via crafted JSON (reachable through MCP /crawl endpoint
via BrowserConfig.load() and CrawlerRunConfig.load()).

Add CRAWL4AI_DESERIALIZE_ALLOW env var: a comma-separated list of
class names permitted for deserialization. Empty or unset = deny all
typed deserialization (default-deny). Operators configure exactly the
types their deployment needs.

Example: CRAWL4AI_DESERIALIZE_ALLOW=BrowserConfig,CrawlerRunConfig,CacheMode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant