perf: optimize engine hot paths — 5x-7x event throughput improvement by fgmacedo · Pull Request #592 · fgmacedo/python-statemachine

fgmacedo · 2026-03-07T19:45:38Z

Summary

Systematic, measurement-driven optimization of the state machine engine's hot paths.
Each optimization is a separate commit, benchmarked and validated independently.

Event throughput: 4.7x–7.7x faster. Setup: 1.9x–2.6x faster.

Benchmark results (develop vs this branch)

Benchmark	develop (µs)	optimized (µs)	Speedup
`flat_self_transition`	260	47	5.6x
`many_transitions_full_cycle`	1,261	166	7.6x
`many_transitions_reset`	1,009	132	7.7x
`guarded_transitions`	547	75	7.3x
`deep_history_cycle`	711	108	6.6x
`history_pause_resume`	629	101	6.2x
`compound_enter_exit`	1,025	217	4.7x
`parallel_region_events`	1,294	192	6.7x
`flat_machine` (setup)	187	95	2.0x
`compound_machine` (setup)	163	63	2.6x
`guarded_machine` (setup)	160	71	2.3x
`history_machine` (setup)	154	64	2.4x
`deep_history_machine` (setup)	159	82	1.9x
`parallel_machine` (setup)	160	75	2.1x

Optimizations (by commit)

1. Cache `State.hash` (`246037f`)

Cache hash at init time instead of calling repr() on every hash. States are used as dict keys and set members throughout the engine.

2. Extract `Configuration` class (`f1e07d0`, `4a2ac88`)

Encapsulate state representation (model field read/write, configuration caching) into a dedicated Configuration class, following Information Expert. Removes for_instance() cache indirection — InstanceState is stored directly in sm.__dict__.

3. Cached no-op for `logger.debug` (`65e46e4`, `9708a09`)

Replace all logger.debug() calls with self._debug(), a cached reference that points to either logger.debug or a no-op lambda depending on whether DEBUG is enabled. Eliminates logging overhead (frame inspection, string formatting, lock acquisition) when DEBUG is disabled — which was consuming 65-75% of event processing time.

4. Remove weakref on `engine.sm` (`68f8825`)

Replace weakref.ref with a direct reference. The engine's lifetime equals the SM's lifetime, and CPython's cyclic GC handles the circular reference. Also extracted _first_transition_that_matches from a closure (recreated per call) to a method on BaseEngine.

5. `InstanceState` — `getattr` with caching (`d62d650`)

Replace 20 boilerplate @property declarations and _ref() weakref with __getattr__ delegation to the underlying State. Cache delegated values in __dict__ on first access — subsequent lookups are direct dict hits. Eliminates 268k weakref deref calls per benchmark cycle.

6. Avoid intermediate `OrderedSet` allocations (`63bc62b`)

Replace two set operations (- and |) that each allocate an intermediate OrderedSet with a single-pass generator + update in _prepare_entry_states.

Thread safety

Added stress tests (tests/test_threading.py::TestThreadSafety) exercising real contention — multiple threads sending events to the same SM via barriers. Documented thread safety guarantees in docs/processing_model.md and AGENTS.md.

No public API changes

All optimizations are internal. No changes to the public interface.

State.__hash__ was computing hash(repr(self)) on every call, generating a ~120-char f-string each time. Since State identity (name + id) is immutable after _set_id(), the hash is now precomputed once and cached. InstanceState.__hash__ also cached at construction time, eliminating both the weakref dereference and the repr() call. Also expands test_profiling.py with 14 benchmarks covering v3 features (compound, parallel, guards, history, deep history) and adds a benchmark workflow guide in tests/benchmarks/README.md. Measured impact vs baseline (pytest-benchmark, pedantic mode): - parallel_region_events: -11% - compound_enter_exit: -7% - history_pause_resume: -9% - deep_history_cycle: -6% - flat_self_transition: -1%

…tion Move the dual scalar/OrderedSet representation, caching, validation, and incremental mutation logic from StateChart and BaseEngine into a dedicated Configuration class (Information Expert pattern). - StateChart properties (configuration, current_state_value, etc.) now delegate to self._config - Engine add/remove methods reduced to one-line _config.add/discard calls - Remove vestigial list handling in current_state (was a v3 dev artifact) - Cache uses identity check against raw value to detect external bypasses

…ct__ Remove the State descriptor protocol (__get__/__set__) and for_instance() cache. InstanceState objects are now created eagerly in StateChart.__init__ and stored directly in sm.__dict__, making sm.<state_id> a plain dict lookup instead of a descriptor call + cache lookup. Configuration no longer holds a weakref to the machine; it receives a dedicated instance_states dict and resolves active states via direct dict lookup. The __setattr__ guard on StateChart preserves the existing protection against accidental state overriding. States whose id collides with an event name are kept out of __dict__ to preserve Event descriptor priority. Fix type mismatches in engines/base.py and event_data.py that were previously masked by the State.__set__ descriptor.

codecov · 2026-03-07T19:49:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (11ffa95) to head (1847088).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff            @@
##           develop      #592   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           38        39    +1     
  Lines         4555      4591   +36     
  Branches       724       732    +8     
=========================================
+ Hits          4555      4591   +36

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DRY up identical InstanceState + Configuration setup in __init__ and __setstate__.

…rhead logger.debug() consumed 65-75% of event processing time even with DEBUG disabled, due to frame inspection, string formatting, and lock acquisition on every call. - Store `self._debug` on BaseEngine at init: real logger.debug when DEBUG is enabled, no-op lambda otherwise - Add BaseEngine.stop() to encapsulate external engine shutdown (replaces _stop_child_machine's inline engine manipulation) - InvokeManager delegates _debug/_log_id to engine for consistent prefixes - Module-level _debug in io/scxml/actions.py for parse-time callables - Remove unused logging imports from sync/async engines and invoke module

DEBUG forced all loggers to emit during tests, bypassing the cached no-op optimization and inflating benchmark numbers. WARNING is the sensible default; pass `-o log_cli_level=DEBUG` when debug logs are needed.

…sition_that_matches - Replace weakref + property with direct reference: eliminates 145k weakref deref + assert calls per benchmark cycle. The engine lifetime is tied to the SM — no leak risk with CPython's cyclic GC (PEP 442). - Extract `first_transition_that_matches` closure into a proper method `_first_transition_that_matches` on both BaseEngine and AsyncEngine, avoiding re-creation of the function object on every _select_transitions call.

Replace 20 boilerplate @Property declarations and weakref _ref() with __getattr__ delegation to the underlying State. Cache delegated values in __dict__ on first access so subsequent lookups are direct dict hits. Eliminates 268k __getattr__/weakref calls per benchmark cycle, yielding ~7% improvement on parallel_region_events (207µs → 193µs) and 61% reduction in ancestors() tottime.

Add TestThreadSafety with stress tests exercising real contention — multiple threads sending events to the same SM simultaneously via barriers. Tests verify no lost events, state consistency, correct callback counts, and safe concurrent reads. Document thread safety guarantees in docs/processing_model.md (linking to atomic_configuration_update for transient None behavior) and AGENTS.md, noting the PriorityQueue-based event queue must remain thread-safe.

Replace two set operations (- and |) that each allocate an intermediate OrderedSet with a single-pass generator + update. Eliminates 2 allocations per microstep.

…anch The isinstance(value, InstanceState) check was unreachable because _build_configuration uses vars(self) directly, bypassing __setattr__. Simplify to always reject assignments to state names via __setattr__. Fixes 100% branch coverage.

sonarqubecloud · 2026-03-08T02:14:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

fgmacedo added 3 commits March 7, 2026 09:09

fgmacedo added 7 commits March 7, 2026 17:32

refactor: extract _build_configuration to remove duplication

4a2ac88

DRY up identical InstanceState + Configuration setup in __init__ and __setstate__.

perf: change log_cli_level from DEBUG to WARNING

9708a09

DEBUG forced all loggers to emit during tests, bypassing the cached no-op optimization and inflating benchmark numbers. WARNING is the sensible default; pass `-o log_cli_level=DEBUG` when debug logs are needed.

perf: avoid intermediate OrderedSet allocations in _prepare_entry_states

63bc62b

Replace two set operations (- and |) that each allocate an intermediate OrderedSet with a single-pass generator + update. Eliminates 2 allocations per microstep.

fgmacedo changed the title ~~perf: optimize hot paths — hash caching, Configuration extraction, InstanceState in __dict__~~ perf: optimize engine hot paths — 5x-7x event throughput improvement Mar 8, 2026

fgmacedo added 3 commits March 7, 2026 23:05

docs: add performance and thread safety entries to 3.1.0 release notes

8ee011e

chore: remove tests/benchmarks/README.md

1847088

fgmacedo merged commit 5a85209 into develop Mar 8, 2026
14 checks passed

fgmacedo deleted the perf/optimize-hot-paths branch March 8, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize engine hot paths — 5x-7x event throughput improvement#592

perf: optimize engine hot paths — 5x-7x event throughput improvement#592
fgmacedo merged 13 commits intodevelopfrom
perf/optimize-hot-paths

fgmacedo commented Mar 7, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fgmacedo commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark results (develop vs this branch)

Optimizations (by commit)

1. Cache State.__hash__ (246037f)

2. Extract Configuration class (f1e07d0, 4a2ac88)

3. Cached no-op for logger.debug (65e46e4, 9708a09)

4. Remove weakref on engine.sm (68f8825)

5. InstanceState — __getattr__ with caching (d62d650)

6. Avoid intermediate OrderedSet allocations (63bc62b)

Thread safety

No public API changes

Uh oh!

codecov bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud bot commented Mar 8, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fgmacedo commented Mar 7, 2026 •

edited

Loading

1. Cache `State.hash` (`246037f`)

2. Extract `Configuration` class (`f1e07d0`, `4a2ac88`)

3. Cached no-op for `logger.debug` (`65e46e4`, `9708a09`)

4. Remove weakref on `engine.sm` (`68f8825`)

5. `InstanceState` — `getattr` with caching (`d62d650`)

6. Avoid intermediate `OrderedSet` allocations (`63bc62b`)

codecov bot commented Mar 7, 2026 •

edited

Loading