Plasmate as an alternative extraction backend - 16x fewer tokens

Hey, big fan of Crawl4AI. The "LLM Friendly" framing is exactly right.

Wanted to share [Plasmate](https://github.com/plasmate-labs/plasmate) as a potential extraction backend. It's an open-source browser engine (Rust, Apache 2.0) that compiles HTML into a Semantic Object Model (SOM) instead of markdown or raw HTML.

The key difference: instead of cleaning up Chrome's DOM output after the fact, Plasmate skips rendering entirely and goes straight to semantic structure. In our 49-URL benchmark, this produces 16.6x fewer tokens on average compared to raw HTML.

A possible integration path would be as an alternative `CrawlerStrategy` that uses Plasmate for static/server-rendered pages and falls back to Chrome for heavy SPAs.

Benchmarks are reproducible: https://github.com/plasmate-labs/plasmate-benchmarks

Not asking for anything - just flagging it as a potentially useful tool for the project. Apache 2.0, free, `pip install plasmate`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plasmate as an alternative extraction backend - 16x fewer tokens #1867

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Plasmate as an alternative extraction backend - 16x fewer tokens #1867

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions