Skip to content

perf: optimize engine hot paths — 5x-7x event throughput improvement#592

Merged
fgmacedo merged 13 commits intodevelopfrom
perf/optimize-hot-paths
Mar 8, 2026
Merged

perf: optimize engine hot paths — 5x-7x event throughput improvement#592
fgmacedo merged 13 commits intodevelopfrom
perf/optimize-hot-paths

Conversation

@fgmacedo
Copy link
Owner

@fgmacedo fgmacedo commented Mar 7, 2026

Summary

Systematic, measurement-driven optimization of the state machine engine's hot paths.
Each optimization is a separate commit, benchmarked and validated independently.

Event throughput: 4.7x–7.7x faster. Setup: 1.9x–2.6x faster.

Benchmark results (develop vs this branch)

Benchmark develop (µs) optimized (µs) Speedup
flat_self_transition 260 47 5.6x
many_transitions_full_cycle 1,261 166 7.6x
many_transitions_reset 1,009 132 7.7x
guarded_transitions 547 75 7.3x
deep_history_cycle 711 108 6.6x
history_pause_resume 629 101 6.2x
compound_enter_exit 1,025 217 4.7x
parallel_region_events 1,294 192 6.7x
flat_machine (setup) 187 95 2.0x
compound_machine (setup) 163 63 2.6x
guarded_machine (setup) 160 71 2.3x
history_machine (setup) 154 64 2.4x
deep_history_machine (setup) 159 82 1.9x
parallel_machine (setup) 160 75 2.1x

Optimizations (by commit)

1. Cache State.__hash__ (246037f)

Cache hash at init time instead of calling repr() on every hash. States are used as dict keys and set members throughout the engine.

2. Extract Configuration class (f1e07d0, 4a2ac88)

Encapsulate state representation (model field read/write, configuration caching) into a dedicated Configuration class, following Information Expert. Removes for_instance() cache indirection — InstanceState is stored directly in sm.__dict__.

3. Cached no-op for logger.debug (65e46e4, 9708a09)

Replace all logger.debug() calls with self._debug(), a cached reference that points to either logger.debug or a no-op lambda depending on whether DEBUG is enabled. Eliminates logging overhead (frame inspection, string formatting, lock acquisition) when DEBUG is disabled — which was consuming 65-75% of event processing time.

4. Remove weakref on engine.sm (68f8825)

Replace weakref.ref with a direct reference. The engine's lifetime equals the SM's lifetime, and CPython's cyclic GC handles the circular reference. Also extracted _first_transition_that_matches from a closure (recreated per call) to a method on BaseEngine.

5. InstanceState__getattr__ with caching (d62d650)

Replace 20 boilerplate @property declarations and _ref() weakref with __getattr__ delegation to the underlying State. Cache delegated values in __dict__ on first access — subsequent lookups are direct dict hits. Eliminates 268k weakref deref calls per benchmark cycle.

6. Avoid intermediate OrderedSet allocations (63bc62b)

Replace two set operations (- and |) that each allocate an intermediate OrderedSet with a single-pass generator + update in _prepare_entry_states.

Thread safety

Added stress tests (tests/test_threading.py::TestThreadSafety) exercising real contention — multiple threads sending events to the same SM via barriers. Documented thread safety guarantees in docs/processing_model.md and AGENTS.md.

No public API changes

All optimizations are internal. No changes to the public interface.

fgmacedo added 3 commits March 7, 2026 09:09
State.__hash__ was computing hash(repr(self)) on every call, generating
a ~120-char f-string each time. Since State identity (name + id) is
immutable after _set_id(), the hash is now precomputed once and cached.

InstanceState.__hash__ also cached at construction time, eliminating
both the weakref dereference and the repr() call.

Also expands test_profiling.py with 14 benchmarks covering v3 features
(compound, parallel, guards, history, deep history) and adds a benchmark
workflow guide in tests/benchmarks/README.md.

Measured impact vs baseline (pytest-benchmark, pedantic mode):
- parallel_region_events: -11%
- compound_enter_exit: -7%
- history_pause_resume: -9%
- deep_history_cycle: -6%
- flat_self_transition: -1%
…tion

Move the dual scalar/OrderedSet representation, caching, validation,
and incremental mutation logic from StateChart and BaseEngine into a
dedicated Configuration class (Information Expert pattern).

- StateChart properties (configuration, current_state_value, etc.) now
  delegate to self._config
- Engine add/remove methods reduced to one-line _config.add/discard calls
- Remove vestigial list handling in current_state (was a v3 dev artifact)
- Cache uses identity check against raw value to detect external bypasses
…ct__

Remove the State descriptor protocol (__get__/__set__) and for_instance()
cache. InstanceState objects are now created eagerly in StateChart.__init__
and stored directly in sm.__dict__, making sm.<state_id> a plain dict
lookup instead of a descriptor call + cache lookup.

Configuration no longer holds a weakref to the machine; it receives a
dedicated instance_states dict and resolves active states via direct
dict lookup.

The __setattr__ guard on StateChart preserves the existing protection
against accidental state overriding. States whose id collides with an
event name are kept out of __dict__ to preserve Event descriptor
priority.

Fix type mismatches in engines/base.py and event_data.py that were
previously masked by the State.__set__ descriptor.
@codecov
Copy link

codecov bot commented Mar 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (11ffa95) to head (1847088).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@            Coverage Diff            @@
##           develop      #592   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           38        39    +1     
  Lines         4555      4591   +36     
  Branches       724       732    +8     
=========================================
+ Hits          4555      4591   +36     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fgmacedo added 7 commits March 7, 2026 17:32
DRY up identical InstanceState + Configuration setup in __init__ and
__setstate__.
…rhead

logger.debug() consumed 65-75% of event processing time even with DEBUG
disabled, due to frame inspection, string formatting, and lock acquisition
on every call.

- Store `self._debug` on BaseEngine at init: real logger.debug when DEBUG
  is enabled, no-op lambda otherwise
- Add BaseEngine.stop() to encapsulate external engine shutdown (replaces
  _stop_child_machine's inline engine manipulation)
- InvokeManager delegates _debug/_log_id to engine for consistent prefixes
- Module-level _debug in io/scxml/actions.py for parse-time callables
- Remove unused logging imports from sync/async engines and invoke module
DEBUG forced all loggers to emit during tests, bypassing the cached no-op
optimization and inflating benchmark numbers. WARNING is the sensible
default; pass `-o log_cli_level=DEBUG` when debug logs are needed.
…sition_that_matches

- Replace weakref + property with direct reference: eliminates 145k
  weakref deref + assert calls per benchmark cycle. The engine lifetime
  is tied to the SM — no leak risk with CPython's cyclic GC (PEP 442).
- Extract `first_transition_that_matches` closure into a proper method
  `_first_transition_that_matches` on both BaseEngine and AsyncEngine,
  avoiding re-creation of the function object on every _select_transitions
  call.
Replace 20 boilerplate @Property declarations and weakref _ref() with
__getattr__ delegation to the underlying State. Cache delegated values
in __dict__ on first access so subsequent lookups are direct dict hits.

Eliminates 268k __getattr__/weakref calls per benchmark cycle, yielding
~7% improvement on parallel_region_events (207µs → 193µs) and 61%
reduction in ancestors() tottime.
Add TestThreadSafety with stress tests exercising real contention —
multiple threads sending events to the same SM simultaneously via
barriers. Tests verify no lost events, state consistency, correct
callback counts, and safe concurrent reads.

Document thread safety guarantees in docs/processing_model.md (linking
to atomic_configuration_update for transient None behavior) and
AGENTS.md, noting the PriorityQueue-based event queue must remain
thread-safe.
Replace two set operations (- and |) that each allocate an intermediate
OrderedSet with a single-pass generator + update. Eliminates 2
allocations per microstep.
@fgmacedo fgmacedo changed the title perf: optimize hot paths — hash caching, Configuration extraction, InstanceState in __dict__ perf: optimize engine hot paths — 5x-7x event throughput improvement Mar 8, 2026
fgmacedo added 3 commits March 7, 2026 23:05
…anch

The isinstance(value, InstanceState) check was unreachable because
_build_configuration uses vars(self) directly, bypassing __setattr__.
Simplify to always reject assignments to state names via __setattr__.

Fixes 100% branch coverage.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 8, 2026

@fgmacedo fgmacedo merged commit 5a85209 into develop Mar 8, 2026
14 checks passed
@fgmacedo fgmacedo deleted the perf/optimize-hot-paths branch March 8, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant