feat: Migrate to Scrapy's native AsyncCrawlerRunner by vdusek · Pull Request #793 · apify/apify-sdk-python

vdusek · 2026-02-16T15:40:54Z

Description

Adopt Scrapy 2.14's AsyncCrawlerRunner to eliminate the deferred_to_future conversion layer.
Function run_scrapy_actor now handles install_reactor internally, removing boilerplate from user code.

Issue

Closes Utilize Scrapy's native async runners - AsyncCrawlerRunner and/or AsyncCrawlerProcess #638

Test plan

CI passes

codecov · 2026-02-16T15:42:27Z

Codecov Report

❌ Patch coverage is 7.69231% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.24%. Comparing base (e1bdbc9) to head (8bfc41c).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
src/apify/scrapy/_actor_runner.py	0.00%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
- Coverage   85.47%   85.24%   -0.23%     
==========================================
  Files          46       46              
  Lines        2691     2697       +6     
==========================================
- Hits         2300     2299       -1     
- Misses        391      398       +7

Flag	Coverage Δ
e2e	`35.40% <0.00%> (?)`
integration	`57.50% <0.00%> (-0.13%)`	⬇️
unit	`72.19% <7.69%> (-0.20%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Adopt Scrapy 2.14's AsyncCrawlerRunner to eliminate the Deferred conversion layer (deferred_to_future). The run_scrapy_actor function now handles asyncio reactor installation internally, removing boilerplate from user code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Scrapy 2.14+ deprecated the spider argument in process_item() and newer versions no longer pass it, causing TypeError in PriceCleanerPipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR migrates the Apify-Scrapy integration to Scrapy 2.14’s native async APIs (AsyncCrawlerRunner) and moves Twisted reactor installation into run_scrapy_actor to reduce user boilerplate when running Scrapy inside an Actor.

Changes:

Bump Scrapy minimum version to >=2.14.0 and update e2e fixtures/tests accordingly.
Switch sample actors/docs to AsyncCrawlerRunner and remove deferred_to_future usage.
Refactor run_scrapy_actor to install the asyncio-compatible Twisted reactor internally.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
uv.lock	Updates locked Scrapy constraint to `>=2.14.0`.
pyproject.toml	Raises `scrapy` optional extra minimum version and adjusts Ruff per-file ignore list.
src/apify/scrapy/_actor_runner.py	Moves reactor installation into `run_scrapy_actor` and simplifies coroutine bridging.
src/apify/scrapy/pipelines/actor_dataset_push.py	Adjusts pipeline signature/logging for dataset pushes.
tests/unit/scrapy/pipelines/test_actor_dataset_push.py	Updates unit test expectations/calls for pipeline behavior.
tests/e2e/test_actor_scrapy.py	Updates actor e2e to require Scrapy `>=2.14.0`.
tests/e2e/test_scrapy/test_basic_spider.py	Updates Scrapy requirement for e2e spider fixture.
tests/e2e/test_scrapy/test_cb_kwargs_spider.py	Updates Scrapy requirement for e2e spider fixture.
tests/e2e/test_scrapy/test_crawl_spider.py	Updates Scrapy requirement for e2e spider fixture.
tests/e2e/test_scrapy/test_custom_pipeline_spider.py	Updates Scrapy requirement for e2e spider fixture.
tests/e2e/test_scrapy/test_itemloader_spider.py	Updates Scrapy requirement for e2e spider fixture.
tests/e2e/test_scrapy/actor_source/main.py	Updates actor entrypoint to rely on `run_scrapy_actor` for reactor setup.
tests/e2e/test_scrapy/actor_source/main.py	Switches to `AsyncCrawlerRunner` in e2e actor fixture code.
tests/e2e/test_scrapy/actor_source/main_custom_pipeline.py	Switches to `AsyncCrawlerRunner` in custom-pipeline e2e actor fixture code.
tests/e2e/test_scrapy/actor_source/pipelines.py	Updates e2e actor pipeline fixture signature.
docs/03_guides/code/scrapy_project/src/main.py	Removes manual `install_reactor` from docs example entrypoint.
docs/03_guides/code/scrapy_project/src/main.py	Switches docs example to `AsyncCrawlerRunner` and awaits `crawl()` directly.
docs/03_guides/06_scrapy.mdx	Updates guide text to reflect reactor installation handled by `run_scrapy_actor`.

Comments suppressed due to low confidence (2)

docs/03_guides/code/scrapy_project/src/main.py:12

This example still imports Scrapy (AsyncCrawlerRunner) and the spider module at import time. Since run_scrapy_actor() installs the asyncio reactor only when called from __main__.py, these module-level Scrapy imports can happen before reactor installation and can prevent switching to AsyncioSelectorReactor. Consider moving Scrapy/spider imports inside main() (or otherwise ensuring no Scrapy/Twisted reactor import occurs before run_scrapy_actor runs).

from scrapy.crawler import AsyncCrawlerRunner

from apify import Actor
from apify.scrapy import apply_apify_settings

# Import your Scrapy spider here.
from .spiders import TitleSpider as Spider

docs/03_guides/code/scrapy_project/src/main.py:9

run_scrapy_actor() installs the reactor when it is called, but this module imports .main (which imports Scrapy) before that happens. If importing .main triggers Twisted reactor initialization, reactor installation inside run_scrapy_actor can fail. One way to avoid this is to keep Scrapy imports out of module top-level in main.py (import them inside main()), so importing .main doesn’t touch Twisted/Scrapy before run_scrapy_actor runs.

from apify.scrapy import initialize_logging, run_scrapy_actor

# Import your main Actor coroutine here.
from .main import main

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/e2e/test_scrapy/actor_source/__main__.py

src/apify/scrapy/pipelines/actor_dataset_push.py

tests/unit/scrapy/pipelines/test_actor_dataset_push.py

tests/e2e/test_scrapy/actor_source/pipelines.py

tests/e2e/test_scrapy/actor_source/main.py

docs/03_guides/06_scrapy.mdx

src/apify/scrapy/_actor_runner.py

tests/e2e/test_scrapy/actor_source/main.py

tests/e2e/test_scrapy/actor_source/main_custom_pipeline.py

vdusek self-assigned this Feb 16, 2026

github-actions bot added this to the 134th sprint - Tooling team milestone Feb 16, 2026

github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 16, 2026

vdusek added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 16, 2026

vdusek changed the title ~~fix: migrate to Scrapy's native AsyncCrawlerRunner~~ fix: Migrate to Scrapy's native AsyncCrawlerRunner Feb 16, 2026

vdusek changed the title ~~fix: Migrate to Scrapy's native AsyncCrawlerRunner~~ feat: Migrate to Scrapy's native AsyncCrawlerRunner Feb 16, 2026

vdusek force-pushed the fix/scrapy-async-crawler-runner branch from dd6317e to f831b18 Compare February 18, 2026 08:03

vdusek and others added 2 commits February 18, 2026 18:28

fix: Remove deprecated spider arg from pipeline process_item methods

c013e33

Scrapy 2.14+ deprecated the spider argument in process_item() and newer versions no longer pass it, causing TypeError in PriceCleanerPipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vdusek force-pushed the fix/scrapy-async-crawler-runner branch from f831b18 to c013e33 Compare February 18, 2026 17:29

vdusek marked this pull request as ready for review February 18, 2026 17:29

vdusek requested review from Pijukatel and Copilot February 18, 2026 17:29

Copilot started reviewing on behalf of vdusek February 18, 2026 17:30 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Pijukatel approved these changes Feb 19, 2026

View reviewed changes

Address feedback

8bfc41c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Migrate to Scrapy's native AsyncCrawlerRunner#793

feat: Migrate to Scrapy's native AsyncCrawlerRunner#793
vdusek wants to merge 3 commits intomasterfrom
fix/scrapy-async-crawler-runner

vdusek commented Feb 16, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

vdusek commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

Test plan

Uh oh!

codecov bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

vdusek commented Feb 16, 2026 •

edited

Loading

codecov bot commented Feb 16, 2026 •

edited

Loading