-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Hey, big fan of Crawl4AI. The "LLM Friendly" framing is exactly right.
Wanted to share Plasmate as a potential extraction backend. It's an open-source browser engine (Rust, Apache 2.0) that compiles HTML into a Semantic Object Model (SOM) instead of markdown or raw HTML.
The key difference: instead of cleaning up Chrome's DOM output after the fact, Plasmate skips rendering entirely and goes straight to semantic structure. In our 49-URL benchmark, this produces 16.6x fewer tokens on average compared to raw HTML.
A possible integration path would be as an alternative CrawlerStrategy that uses Plasmate for static/server-rendered pages and falls back to Chrome for heavy SPAs.
Benchmarks are reproducible: https://github.com/plasmate-labs/plasmate-benchmarks
Not asking for anything - just flagging it as a potentially useful tool for the project. Apache 2.0, free, pip install plasmate.