feat: Support Actor schema storages with Alias mechanism#797
feat: Support Actor schema storages with Alias mechanism#797
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #797 +/- ##
==========================================
+ Coverage 85.47% 85.96% +0.49%
==========================================
Files 46 46
Lines 2691 2722 +31
==========================================
+ Hits 2300 2340 +40
+ Misses 391 382 -9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d9c7349 to
3b36459
Compare
TODO - how should it behave locally?
There was a problem hiding this comment.
Pull request overview
Adds support for Actor “schema storages” / multiple pre-created storages by loading storage IDs into Configuration and pre-registering alias→ID mappings so open_* (alias=...) resolves to those platform-provided storages when running on Apify.
Changes:
- Introduces
Configuration.actor_storages(parsed from an env-provided JSON object) via newActorStoragesmodel. - Extends
AliasResolverwithregister_aliases()and calls it duringActorinitialization on the Apify platform to seed alias mappings. - Adds unit/integration/E2E tests validating configuration parsing and alias resolution behavior.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/apify/_configuration.py |
Adds ActorStorages + env/validator wiring for Configuration.actor_storages. |
src/apify/storage_clients/_apify/_alias_resolving.py |
Adds AliasResolver.register_aliases() to bulk write alias mappings into default KVS + in-memory cache. |
src/apify/_actor.py |
Calls register_aliases() on Actor startup when running on-platform. |
tests/unit/actor/test_configuration.py |
Unit test for parsing storages JSON env var into actor_storages. |
tests/integration/test_storages.py |
Integration test asserting alias mapping is preserved/extended in default KVS. |
tests/e2e/test_schema_storages/test_schema_storages.py |
E2E test ensuring schema-defined storages are usable via alias at runtime. |
tests/e2e/test_schema_storages/actor_source/main.py |
Actor code used by the E2E test to open a dataset by alias and validate ID. |
tests/e2e/test_schema_storages/actor_source/actor.json |
Actor schema defining storages for the E2E scenario. |
tests/e2e/test_schema_storages/__init__.py |
Marks the new E2E package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| client = await cls._get_default_kvs_client(configuration=configuration) | ||
| existing_mapping = ((await client.get_record(cls._ALIAS_MAPPING_KEY)) or {'value': {}}).get('value', {}) | ||
|
|
||
| # Update the existing mapping with the configuration mapping. | ||
| existing_mapping.update(configuration_mapping) | ||
| # Store the updated mapping back in the KVS and in memory. | ||
| await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping) |
There was a problem hiding this comment.
existing_mapping = ((await client.get_record(...)) or {'value': {}}).get('value', {}) assumes the record always has a value key containing a dict. However, this module already documents/handles get_record sometimes returning the mapping dict directly (without value). In that case, this code will treat the mapping as missing and overwrite it with only configuration_mapping. Also, if value is present but not a dict (e.g. None), existing_mapping.update(...) will raise. Please mirror the normalization logic used in _get_alias_map/store_mapping: normalize record into a dict[str, str] whether it comes wrapped in {key,value} or as a raw mapping, otherwise default to {}.
| actor_storages: Annotated[ | ||
| ActorStorages | None, | ||
| Field( | ||
| alias='actor_storages_json', | ||
| description='Storage IDs for the actor', | ||
| ), | ||
| BeforeValidator(_load_storage_keys), | ||
| ] = None |
There was a problem hiding this comment.
This change closes #762, but the issue description specifies the platform-provided env var name as ACTOR_STORAGE_IDS (object with datasets, keyValueStores, requestQueues). The new field only declares alias='actor_storages_json' (env ACTOR_STORAGES_JSON). If the platform actually uses ACTOR_STORAGE_IDS, configuration loading will silently miss the mapping. Consider supporting ACTOR_STORAGE_IDS via validation_alias=AliasChoices(...) (keeping backward compatibility if ACTOR_STORAGES_JSON is intentional).
There was a problem hiding this comment.
The implementation on the platform is ACTOR_STORAGES_JSON now, if it changes the code should reflect it as well
| return apify_client_async.key_value_store(key_value_store_id=configuration.default_key_value_store_id) | ||
|
|
||
| @classmethod | ||
| async def register_aliases(cls, configuration: Configuration) -> None: |
There was a problem hiding this comment.
register_aliaseslacks error handling
src/apify/storage_clients/_apify/_alias_resolving.py:265-302 — The existing store_mapping method wraps KVS operations in try/except (line 236), but register_aliases does not. If get_record or set_record fails (e.g., network issue during Actor init), the entire Actor startup will crash.
# register_aliases — no error handling:
existing_mapping = ((await client.get_record(...)) or {'value': {}}).get('value', {})
await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping)Consider wrapping in try/except with a warning, consistent with store_mapping.
|
|
||
| @classmethod | ||
| async def register_aliases(cls, configuration: Configuration) -> None: | ||
| """Load alias mapping from configuration to the default kvs.""" |
There was a problem hiding this comment.
register_aliases doesn't acquire the alias lock
src/apify/storage_clients/_apify/_alias_resolving.py:268 — The register_aliases classmethod reads and writes both the KVS record and cls._alias_map without acquiring _alias_init_lock. Meanwhile, open_by_alias (which calls store_mapping) does acquire the lock via the async context manager. If a user calls open_dataset(alias='custom') concurrently with Actor init, you could get a race condition on both the in-memory map and the KVS record.
| (configuration.actor_storages.datasets, 'Dataset'), | ||
| (configuration.actor_storages.request_queues, 'RequestQueue'), | ||
| ): | ||
| for storage_alias, storage_id in mapping.items(): |
There was a problem hiding this comment.
Redundant AliasResolver instantiation in the loop
src/apify/storage_clients/_apify/_alias_resolving.py:285-293 — A new AliasResolver instance is created for each (storage_alias, storage_id) pair just to access _storage_key. Each instantiation calls hash_api_base_url_and_token(configuration), which is the same every time. Consider either:
- Extracting
_storage_keyas a@staticmethod/@classmethodthat takes components directly, or - Computing
_additional_cache_keyonce outside the loop
# Current — creates N instances just for the key:
for storage_alias, storage_id in mapping.items():
configuration_mapping[
cls(
storage_type=storage_type,
alias='__default__' if storage_alias == 'default' else storage_alias,
configuration=configuration,
)._storage_key
] = storage_id|
|
||
| configuration_mapping = {} | ||
|
|
||
| if configuration.default_dataset_id != configuration.actor_storages.datasets.get('default'): |
There was a problem hiding this comment.
Conflict check is only for datasets
src/apify/storage_clients/_apify/_alias_resolving.py:274-278 — There's a warning log for conflicting default dataset IDs, but no equivalent check for default KVS or default RQ. Either check all three storage types or drop the check entirely (since the platform should be authoritative).
if configuration.default_dataset_id != configuration.actor_storages.datasets.get('default'):
logger.warning(...)
# No similar check for KVS or RQ| configuration.default_key_value_store_id = default_kvs.id | ||
| await AliasResolver.register_aliases(configuration=configuration) | ||
| assert await default_kvs.get_value(AliasResolver._ALIAS_MAPPING_KEY) == expected_mapping | ||
| finally: |
There was a problem hiding this comment.
Integration test doesn't clean up _alias_map
tests/integration/test_storages.py:176-180 — The finally block only drops the KVS but doesn't reset AliasResolver._alias_map. While the test isolation fixture does this between tests, it's good practice to clean up what you dirty — especially since _alias_map is a class variable that persists across the process.
| @@ -0,0 +1,24 @@ | |||
| { | |||
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
Configurationto includeactor_storagesthat is loaded fromactor_storages_jsonenv variable.AliasResolverto be able to extend alias mapping fromConfiguration.AliasResolverto extend alias mapping.Issues
Testing
Checklist