Skip to content

fix: run extraction strategy on cache hits (#1455)#1866

Open
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-cache-extraction-strategy-1455
Open

fix: run extraction strategy on cache hits (#1455)#1866
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-cache-extraction-strategy-1455

Conversation

@hafezparast
Copy link
Contributor

Summary

  • When cache_mode=CacheMode.ENABLED and a URL is cached, the cache-hit path returned the old CrawlResult directly without calling aprocess_html()
  • This meant extraction strategies (LLM, CSS, etc.) were never applied to cached content — extracted_content was empty or stale
  • Fix: on cache hit, if config.extraction_strategy is set, run aprocess_html() on the cached HTML so the extraction pipeline executes
  • Cache hits without an extraction strategy continue to return immediately (no behavior change)

Root cause identified by @SohamKukreti in the issue comments.

Fixes #1455

Test plan

  • test_extraction_runs_on_cache_hit — warms cache without extraction, then hits cache with JsonCssExtractionStrategy and verifies extracted_content is populated
  • test_cache_without_extraction_still_works — cache hit without extraction strategy returns normally

🤖 Generated with Claude Code

…#1455)

When cache_mode=ENABLED and a URL was already cached, the cache-hit
path returned the old CrawlResult directly without calling
aprocess_html(). This meant extraction strategies (LLM, CSS, etc.)
were never applied to cached content — extracted_content was empty
or stale.

Now, when a cache hit occurs and config.extraction_strategy is set,
the processing pipeline runs on the cached HTML so the extraction
strategy is applied. Cache hits without an extraction strategy
continue to return immediately (no behavior change).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant