perf: optimize engine hot paths — 5x-7x event throughput improvement#592
Merged
perf: optimize engine hot paths — 5x-7x event throughput improvement#592
Conversation
State.__hash__ was computing hash(repr(self)) on every call, generating a ~120-char f-string each time. Since State identity (name + id) is immutable after _set_id(), the hash is now precomputed once and cached. InstanceState.__hash__ also cached at construction time, eliminating both the weakref dereference and the repr() call. Also expands test_profiling.py with 14 benchmarks covering v3 features (compound, parallel, guards, history, deep history) and adds a benchmark workflow guide in tests/benchmarks/README.md. Measured impact vs baseline (pytest-benchmark, pedantic mode): - parallel_region_events: -11% - compound_enter_exit: -7% - history_pause_resume: -9% - deep_history_cycle: -6% - flat_self_transition: -1%
…tion Move the dual scalar/OrderedSet representation, caching, validation, and incremental mutation logic from StateChart and BaseEngine into a dedicated Configuration class (Information Expert pattern). - StateChart properties (configuration, current_state_value, etc.) now delegate to self._config - Engine add/remove methods reduced to one-line _config.add/discard calls - Remove vestigial list handling in current_state (was a v3 dev artifact) - Cache uses identity check against raw value to detect external bypasses
…ct__ Remove the State descriptor protocol (__get__/__set__) and for_instance() cache. InstanceState objects are now created eagerly in StateChart.__init__ and stored directly in sm.__dict__, making sm.<state_id> a plain dict lookup instead of a descriptor call + cache lookup. Configuration no longer holds a weakref to the machine; it receives a dedicated instance_states dict and resolves active states via direct dict lookup. The __setattr__ guard on StateChart preserves the existing protection against accidental state overriding. States whose id collides with an event name are kept out of __dict__ to preserve Event descriptor priority. Fix type mismatches in engines/base.py and event_data.py that were previously masked by the State.__set__ descriptor.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #592 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 38 39 +1
Lines 4555 4591 +36
Branches 724 732 +8
=========================================
+ Hits 4555 4591 +36
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
DRY up identical InstanceState + Configuration setup in __init__ and __setstate__.
…rhead logger.debug() consumed 65-75% of event processing time even with DEBUG disabled, due to frame inspection, string formatting, and lock acquisition on every call. - Store `self._debug` on BaseEngine at init: real logger.debug when DEBUG is enabled, no-op lambda otherwise - Add BaseEngine.stop() to encapsulate external engine shutdown (replaces _stop_child_machine's inline engine manipulation) - InvokeManager delegates _debug/_log_id to engine for consistent prefixes - Module-level _debug in io/scxml/actions.py for parse-time callables - Remove unused logging imports from sync/async engines and invoke module
DEBUG forced all loggers to emit during tests, bypassing the cached no-op optimization and inflating benchmark numbers. WARNING is the sensible default; pass `-o log_cli_level=DEBUG` when debug logs are needed.
…sition_that_matches - Replace weakref + property with direct reference: eliminates 145k weakref deref + assert calls per benchmark cycle. The engine lifetime is tied to the SM — no leak risk with CPython's cyclic GC (PEP 442). - Extract `first_transition_that_matches` closure into a proper method `_first_transition_that_matches` on both BaseEngine and AsyncEngine, avoiding re-creation of the function object on every _select_transitions call.
Replace 20 boilerplate @Property declarations and weakref _ref() with __getattr__ delegation to the underlying State. Cache delegated values in __dict__ on first access so subsequent lookups are direct dict hits. Eliminates 268k __getattr__/weakref calls per benchmark cycle, yielding ~7% improvement on parallel_region_events (207µs → 193µs) and 61% reduction in ancestors() tottime.
Add TestThreadSafety with stress tests exercising real contention — multiple threads sending events to the same SM simultaneously via barriers. Tests verify no lost events, state consistency, correct callback counts, and safe concurrent reads. Document thread safety guarantees in docs/processing_model.md (linking to atomic_configuration_update for transient None behavior) and AGENTS.md, noting the PriorityQueue-based event queue must remain thread-safe.
Replace two set operations (- and |) that each allocate an intermediate OrderedSet with a single-pass generator + update. Eliminates 2 allocations per microstep.
…anch The isinstance(value, InstanceState) check was unreachable because _build_configuration uses vars(self) directly, bypassing __setattr__. Simplify to always reject assignments to state names via __setattr__. Fixes 100% branch coverage.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Systematic, measurement-driven optimization of the state machine engine's hot paths.
Each optimization is a separate commit, benchmarked and validated independently.
Event throughput: 4.7x–7.7x faster. Setup: 1.9x–2.6x faster.
Benchmark results (develop vs this branch)
flat_self_transitionmany_transitions_full_cyclemany_transitions_resetguarded_transitionsdeep_history_cyclehistory_pause_resumecompound_enter_exitparallel_region_eventsflat_machine(setup)compound_machine(setup)guarded_machine(setup)history_machine(setup)deep_history_machine(setup)parallel_machine(setup)Optimizations (by commit)
1. Cache
State.__hash__(246037f)Cache hash at init time instead of calling
repr()on every hash. States are used as dict keys and set members throughout the engine.2. Extract
Configurationclass (f1e07d0,4a2ac88)Encapsulate state representation (model field read/write, configuration caching) into a dedicated
Configurationclass, following Information Expert. Removesfor_instance()cache indirection —InstanceStateis stored directly insm.__dict__.3. Cached no-op for
logger.debug(65e46e4,9708a09)Replace all
logger.debug()calls withself._debug(), a cached reference that points to eitherlogger.debugor a no-op lambda depending on whether DEBUG is enabled. Eliminates logging overhead (frame inspection, string formatting, lock acquisition) when DEBUG is disabled — which was consuming 65-75% of event processing time.4. Remove weakref on
engine.sm(68f8825)Replace
weakref.refwith a direct reference. The engine's lifetime equals the SM's lifetime, and CPython's cyclic GC handles the circular reference. Also extracted_first_transition_that_matchesfrom a closure (recreated per call) to a method onBaseEngine.5.
InstanceState—__getattr__with caching (d62d650)Replace 20 boilerplate
@propertydeclarations and_ref()weakref with__getattr__delegation to the underlyingState. Cache delegated values in__dict__on first access — subsequent lookups are direct dict hits. Eliminates 268k weakref deref calls per benchmark cycle.6. Avoid intermediate
OrderedSetallocations (63bc62b)Replace two set operations (
-and|) that each allocate an intermediateOrderedSetwith a single-pass generator + update in_prepare_entry_states.Thread safety
Added stress tests (
tests/test_threading.py::TestThreadSafety) exercising real contention — multiple threads sending events to the same SM via barriers. Documented thread safety guarantees indocs/processing_model.mdandAGENTS.md.No public API changes
All optimizations are internal. No changes to the public interface.