Skip to content

feat: Support Actor schema storages with Alias mechanism#797

Open
Pijukatel wants to merge 10 commits intomasterfrom
alias-storage-configuration-mapping
Open

feat: Support Actor schema storages with Alias mechanism#797
Pijukatel wants to merge 10 commits intomasterfrom
alias-storage-configuration-mapping

Conversation

@Pijukatel
Copy link
Contributor

Description

  • Update Configuration to include actor_storages that is loaded from actor_storages_json env variable.
  • Update AliasResolver to be able to extend alias mapping from Configuration.
  • If the Actor is running on the Apify platform, it will use AliasResolver to extend alias mapping.

Issues

Testing

  • Added E2E test
  • Manual Actor test

Checklist

  • CI passed

@github-actions github-actions bot added this to the 134th sprint - Tooling team milestone Feb 17, 2026
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 17, 2026
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.96%. Comparing base (e1bdbc9) to head (aa5624f).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/apify/_actor.py 50.00% 1 Missing ⚠️
...c/apify/storage_clients/_apify/_alias_resolving.py 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #797      +/-   ##
==========================================
+ Coverage   85.47%   85.96%   +0.49%     
==========================================
  Files          46       46              
  Lines        2691     2722      +31     
==========================================
+ Hits         2300     2340      +40     
+ Misses        391      382       -9     
Flag Coverage Δ
e2e 35.48% <36.11%> (?)
integration 58.30% <88.88%> (+0.66%) ⬆️
unit 72.18% <55.55%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pijukatel Pijukatel requested a review from vdusek February 17, 2026 09:01
@Pijukatel Pijukatel force-pushed the alias-storage-configuration-mapping branch from d9c7349 to 3b36459 Compare February 17, 2026 09:11
@Pijukatel Pijukatel marked this pull request as ready for review February 18, 2026 12:47
@Pijukatel Pijukatel requested a review from valekjo February 18, 2026 12:47
@vdusek vdusek requested a review from Copilot February 18, 2026 13:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for Actor “schema storages” / multiple pre-created storages by loading storage IDs into Configuration and pre-registering alias→ID mappings so open_* (alias=...) resolves to those platform-provided storages when running on Apify.

Changes:

  • Introduces Configuration.actor_storages (parsed from an env-provided JSON object) via new ActorStorages model.
  • Extends AliasResolver with register_aliases() and calls it during Actor initialization on the Apify platform to seed alias mappings.
  • Adds unit/integration/E2E tests validating configuration parsing and alias resolution behavior.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/apify/_configuration.py Adds ActorStorages + env/validator wiring for Configuration.actor_storages.
src/apify/storage_clients/_apify/_alias_resolving.py Adds AliasResolver.register_aliases() to bulk write alias mappings into default KVS + in-memory cache.
src/apify/_actor.py Calls register_aliases() on Actor startup when running on-platform.
tests/unit/actor/test_configuration.py Unit test for parsing storages JSON env var into actor_storages.
tests/integration/test_storages.py Integration test asserting alias mapping is preserved/extended in default KVS.
tests/e2e/test_schema_storages/test_schema_storages.py E2E test ensuring schema-defined storages are usable via alias at runtime.
tests/e2e/test_schema_storages/actor_source/main.py Actor code used by the E2E test to open a dataset by alias and validate ID.
tests/e2e/test_schema_storages/actor_source/actor.json Actor schema defining storages for the E2E scenario.
tests/e2e/test_schema_storages/__init__.py Marks the new E2E package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +295 to +301
client = await cls._get_default_kvs_client(configuration=configuration)
existing_mapping = ((await client.get_record(cls._ALIAS_MAPPING_KEY)) or {'value': {}}).get('value', {})

# Update the existing mapping with the configuration mapping.
existing_mapping.update(configuration_mapping)
# Store the updated mapping back in the KVS and in memory.
await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing_mapping = ((await client.get_record(...)) or {'value': {}}).get('value', {}) assumes the record always has a value key containing a dict. However, this module already documents/handles get_record sometimes returning the mapping dict directly (without value). In that case, this code will treat the mapping as missing and overwrite it with only configuration_mapping. Also, if value is present but not a dict (e.g. None), existing_mapping.update(...) will raise. Please mirror the normalization logic used in _get_alias_map/store_mapping: normalize record into a dict[str, str] whether it comes wrapped in {key,value} or as a raw mapping, otherwise default to {}.

Copilot uses AI. Check for mistakes.
Comment on lines +473 to +480
actor_storages: Annotated[
ActorStorages | None,
Field(
alias='actor_storages_json',
description='Storage IDs for the actor',
),
BeforeValidator(_load_storage_keys),
] = None
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change closes #762, but the issue description specifies the platform-provided env var name as ACTOR_STORAGE_IDS (object with datasets, keyValueStores, requestQueues). The new field only declares alias='actor_storages_json' (env ACTOR_STORAGES_JSON). If the platform actually uses ACTOR_STORAGE_IDS, configuration loading will silently miss the mapping. Consider supporting ACTOR_STORAGE_IDS via validation_alias=AliasChoices(...) (keeping backward compatibility if ACTOR_STORAGES_JSON is intentional).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation on the platform is ACTOR_STORAGES_JSON now, if it changes the code should reflect it as well

Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments

return apify_client_async.key_value_store(key_value_store_id=configuration.default_key_value_store_id)

@classmethod
async def register_aliases(cls, configuration: Configuration) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. register_aliases lacks error handling

src/apify/storage_clients/_apify/_alias_resolving.py:265-302 — The existing store_mapping method wraps KVS operations in try/except (line 236), but register_aliases does not. If get_record or set_record fails (e.g., network issue during Actor init), the entire Actor startup will crash.

# register_aliases — no error handling:
existing_mapping = ((await client.get_record(...)) or {'value': {}}).get('value', {})
await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping)

Consider wrapping in try/except with a warning, consistent with store_mapping.


@classmethod
async def register_aliases(cls, configuration: Configuration) -> None:
"""Load alias mapping from configuration to the default kvs."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_aliases doesn't acquire the alias lock

src/apify/storage_clients/_apify/_alias_resolving.py:268 — The register_aliases classmethod reads and writes both the KVS record and cls._alias_map without acquiring _alias_init_lock. Meanwhile, open_by_alias (which calls store_mapping) does acquire the lock via the async context manager. If a user calls open_dataset(alias='custom') concurrently with Actor init, you could get a race condition on both the in-memory map and the KVS record.

(configuration.actor_storages.datasets, 'Dataset'),
(configuration.actor_storages.request_queues, 'RequestQueue'),
):
for storage_alias, storage_id in mapping.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant AliasResolver instantiation in the loop

src/apify/storage_clients/_apify/_alias_resolving.py:285-293 — A new AliasResolver instance is created for each (storage_alias, storage_id) pair just to access _storage_key. Each instantiation calls hash_api_base_url_and_token(configuration), which is the same every time. Consider either:

  • Extracting _storage_key as a @staticmethod / @classmethod that takes components directly, or
  • Computing _additional_cache_key once outside the loop
# Current — creates N instances just for the key:
for storage_alias, storage_id in mapping.items():
    configuration_mapping[
        cls(
            storage_type=storage_type,
            alias='__default__' if storage_alias == 'default' else storage_alias,
            configuration=configuration,
        )._storage_key
    ] = storage_id


configuration_mapping = {}

if configuration.default_dataset_id != configuration.actor_storages.datasets.get('default'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict check is only for datasets

src/apify/storage_clients/_apify/_alias_resolving.py:274-278 — There's a warning log for conflicting default dataset IDs, but no equivalent check for default KVS or default RQ. Either check all three storage types or drop the check entirely (since the platform should be authoritative).

if configuration.default_dataset_id != configuration.actor_storages.datasets.get('default'):
    logger.warning(...)
# No similar check for KVS or RQ

configuration.default_key_value_store_id = default_kvs.id
await AliasResolver.register_aliases(configuration=configuration)
assert await default_kvs.get_value(AliasResolver._ALIAS_MAPPING_KEY) == expected_mapping
finally:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration test doesn't clean up _alias_map

tests/integration/test_storages.py:176-180 — The finally block only drops the KVS but doesn't reset AliasResolver._alias_map. While the test isolation fixture does this between tests, it's good practice to clean up what you dirty — especially since _alias_map is a class variable that persists across the process.

@@ -0,0 +1,24 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces please

Pijukatel and others added 2 commits February 19, 2026 08:44
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adapt to Apify's "multiple storages" functionality

2 participants

Comments