-
-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Problem
In v3, when a state chart uses compound or parallel states, the Configuration stores all active state values on the model as an OrderedSet. This creates an impedance mismatch with relational database columns:
| Topology | model.state value |
|---|---|
| Flat SM | "draft" (scalar string ✓) |
| Compound SM | OrderedSet(["idle", "sub_a"]) |
| Parallel SM | OrderedSet(["war", "region_a", "a1", "region_b", "b1"]) |
Users coming from v2 are accustomed to model.state always being a scalar string that maps directly to a DB column. With compound/parallel states, this is no longer the case — the model receives an OrderedSet, which requires custom serialization/deserialization on the model's state property.
Why parent states are redundant
In any hierarchical state machine, the full configuration can be deterministically reconstructed from just the leaf (atomic) states by walking State.ancestors(). Storing parent states is redundant information:
"sub_a"→ reconstruct →{idle, sub_a}(compound parentidleis derived)OrderedSet(["a1", "b1"])→ reconstruct →{war, region_a, a1, region_b, b1}(all parents derived)
This means that for flat and compound state machines (the vast majority of use cases), model.state could remain a single scalar string — even with hierarchical states.
Only truly parallel regions would require multiple values.
Current workaround
Users can implement serialization/deserialization on their model's state property today (see SQLite example). This works, but:
- Requires boilerplate on every model
- Users need to know about
OrderedSetinternals - No standard format (CSV? JSON? custom?)
- The library doesn't help the model make informed decisions about what to store
Design constraints
atomic_configuration_update: both modes must be supported- Flat vs HSM: flat SMs must remain scalar (zero change for v2 users)
- Backward compatible:
getattr/setattrprotocol on the model stays unchanged - Direct model manipulation: setting
model.statedirectly must still work - Low overhead: any reconstruction must be cached; avoid allocations in hot paths
- Model owns persistence: the library should not impose a serialization format — the model (and ultimately the user) decides how to persist
- Simple DB path: the common case (single DB column) should require minimal code
- Flexible: other persistence models (multiple fields, denormalization, event sourcing) must remain possible
- Semantic versioning: any behavioral change to what the model receives must be opt-in in 3.x and can become default in 4.x
Possible approaches
Approach 1: Class-level opt-in flag
A boolean flag on the state chart class that tells Configuration to store only leaf states:
class MyWorkflow(StateChart):
persist_leaf_states_only = True # opt-inWhen True:
states.setter(atomic mode) filters to leaf states before writing to modeladd()/discard()(incremental mode) only write leaf states to modelstatesgetter always reconstructs the full configuration from stored leaves (this reconstruction is idempotent — it also works correctly with full-config values for backward compat)
Pros: simple, one-line opt-in, no changes to the model
Cons: the library decides the storage strategy; model has no say in the format
Approach 2: Model Protocol (serialize/deserialize delegation)
The model implements an opt-in Protocol that the Configuration detects:
from typing import Protocol, Any, Dict
class StatePersistence(Protocol):
def serialize_state(self, value: Any, states_map: Dict[Any, "State"]) -> Any:
"""Convert state value before writing to model.
Receives the raw value (scalar or OrderedSet) and the full states_map
for context (e.g., to filter leaf states, check hierarchy, etc.).
"""
...
def deserialize_state(self, raw: Any, states_map: Dict[Any, "State"]) -> Any:
"""Convert stored value back to what Configuration expects.
Returns scalar or OrderedSet.
"""
...If the model implements this protocol, Configuration delegates serialization to the model, passing enough context (states_map, possibly instance_states) for the model to make informed decisions.
class Document:
def serialize_state(self, value, states_map):
# Store only leaf states as CSV
if isinstance(value, OrderedSet):
leaves = [v for v in value if states_map[v].is_atomic]
return ",".join(str(v) for v in leaves)
return str(value) if value is not None else None
def deserialize_state(self, raw, states_map):
if raw is None:
return None
parts = raw.split(",")
return parts[0] if len(parts) == 1 else OrderedSet(parts)Pros: model owns persistence completely, maximum flexibility, can implement any format (CSV, JSON, multiple fields, etc.), the library provides context for informed decisions
Cons: more boilerplate per model (though a base mixin could be provided), model needs to understand states_map
Approach 3: Codec/transform parameter
A transform object passed to the state chart constructor:
sm = MyWorkflow(model=doc, state_codec=LeafOnlyCSVCodec())The codec intercepts reads/writes between Configuration and the model:
class LeafOnlyCSVCodec:
def encode(self, value, states_map):
"""Configuration → model"""
...
def decode(self, raw, states_map):
"""model → Configuration"""
...Pros: reusable codecs, decoupled from both SM and model
Cons: new parameter on the constructor, another concept to learn
Approach 4: Combine flag + protocol
- The flag (
persist_leaf_states_only = True) handles the simple case (leaf filtering) - The protocol handles advanced cases (custom serialization)
- If both are present, the protocol takes precedence
This gives users a progressive path: start with the flag for zero-effort DB compat, graduate to the protocol for custom needs.
What we'd like feedback on
- Which approach feels most natural for your use case?
- Is the leaf-only flag sufficient for most persistence needs, or do you need the flexibility of a protocol/codec?
- For the protocol approach: what information would your model need from the library to make good serialization decisions? Is
states_mapenough, or would you need more context? - Are there persistence patterns we haven't considered? (Event sourcing, CQRS, multi-table, etc.)
Any feedback is welcome — especially from users who are persisting state to databases in production.