RFC: Improve persistence ergonomics for compound/parallel state machines

## Problem

In v3, when a state chart uses compound or parallel states, the `Configuration` stores **all active state values** on the model as an `OrderedSet`. This creates an impedance mismatch with relational database columns:

| Topology | `model.state` value |
|---|---|
| Flat SM | `"draft"` (scalar string ✓) |
| Compound SM | `OrderedSet(["idle", "sub_a"])` |
| Parallel SM | `OrderedSet(["war", "region_a", "a1", "region_b", "b1"])` |

Users coming from v2 are accustomed to `model.state` always being a scalar string that maps directly to a DB column. With compound/parallel states, this is no longer the case — the model receives an `OrderedSet`, which requires custom serialization/deserialization on the model's state property.

### Why parent states are redundant

In any hierarchical state machine, the **full configuration can be deterministically reconstructed** from just the leaf (atomic) states by walking `State.ancestors()`. Storing parent states is redundant information:

- `"sub_a"` → reconstruct → `{idle, sub_a}` (compound parent `idle` is derived)
- `OrderedSet(["a1", "b1"])` → reconstruct → `{war, region_a, a1, region_b, b1}` (all parents derived)

This means that for **flat and compound state machines** (the vast majority of use cases), `model.state` could remain a **single scalar string** — even with hierarchical states.

Only truly parallel regions would require multiple values.

### Current workaround

Users can implement serialization/deserialization on their model's state property today (see [SQLite example](https://github.com/fgmacedo/python-statemachine/blob/develop/tests/examples/sqlite_persistent_model_machine.py)). This works, but:

- Requires boilerplate on every model
- Users need to know about `OrderedSet` internals
- No standard format (CSV? JSON? custom?)
- The library doesn't help the model make informed decisions about what to store

## Design constraints

1. **`atomic_configuration_update`**: both modes must be supported
2. **Flat vs HSM**: flat SMs must remain scalar (zero change for v2 users)
3. **Backward compatible**: `getattr`/`setattr` protocol on the model stays unchanged
4. **Direct model manipulation**: setting `model.state` directly must still work
5. **Low overhead**: any reconstruction must be cached; avoid allocations in hot paths
6. **Model owns persistence**: the library should not impose a serialization format — the model (and ultimately the user) decides how to persist
7. **Simple DB path**: the common case (single DB column) should require minimal code
8. **Flexible**: other persistence models (multiple fields, denormalization, event sourcing) must remain possible
9. **Semantic versioning**: any behavioral change to what the model receives must be opt-in in 3.x and can become default in 4.x

## Possible approaches

### Approach 1: Class-level opt-in flag

A boolean flag on the state chart class that tells `Configuration` to store only leaf states:

```python
class MyWorkflow(StateChart):
    persist_leaf_states_only = True  # opt-in
```

When `True`:
- `states.setter` (atomic mode) filters to leaf states before writing to model
- `add()`/`discard()` (incremental mode) only write leaf states to model
- `states` getter always reconstructs the full configuration from stored leaves (this reconstruction is idempotent — it also works correctly with full-config values for backward compat)

**Pros**: simple, one-line opt-in, no changes to the model
**Cons**: the library decides the storage strategy; model has no say in the format

### Approach 2: Model Protocol (serialize/deserialize delegation)

The model implements an opt-in `Protocol` that the `Configuration` detects:

```python
from typing import Protocol, Any, Dict

class StatePersistence(Protocol):
    def serialize_state(self, value: Any, states_map: Dict[Any, "State"]) -> Any:
        """Convert state value before writing to model.
        
        Receives the raw value (scalar or OrderedSet) and the full states_map
        for context (e.g., to filter leaf states, check hierarchy, etc.).
        """
        ...
    
    def deserialize_state(self, raw: Any, states_map: Dict[Any, "State"]) -> Any:
        """Convert stored value back to what Configuration expects.
        
        Returns scalar or OrderedSet.
        """
        ...
```

If the model implements this protocol, `Configuration` delegates serialization to the model, passing enough context (`states_map`, possibly `instance_states`) for the model to make informed decisions.

```python
class Document:
    def serialize_state(self, value, states_map):
        # Store only leaf states as CSV
        if isinstance(value, OrderedSet):
            leaves = [v for v in value if states_map[v].is_atomic]
            return ",".join(str(v) for v in leaves)
        return str(value) if value is not None else None
    
    def deserialize_state(self, raw, states_map):
        if raw is None:
            return None
        parts = raw.split(",")
        return parts[0] if len(parts) == 1 else OrderedSet(parts)
```

**Pros**: model owns persistence completely, maximum flexibility, can implement any format (CSV, JSON, multiple fields, etc.), the library provides context for informed decisions
**Cons**: more boilerplate per model (though a base mixin could be provided), model needs to understand `states_map`

### Approach 3: Codec/transform parameter

A transform object passed to the state chart constructor:

```python
sm = MyWorkflow(model=doc, state_codec=LeafOnlyCSVCodec())
```

The codec intercepts reads/writes between Configuration and the model:

```python
class LeafOnlyCSVCodec:
    def encode(self, value, states_map):
        """Configuration → model"""
        ...
    
    def decode(self, raw, states_map):
        """model → Configuration"""
        ...
```

**Pros**: reusable codecs, decoupled from both SM and model
**Cons**: new parameter on the constructor, another concept to learn

### Approach 4: Combine flag + protocol

- The flag (`persist_leaf_states_only = True`) handles the simple case (leaf filtering)
- The protocol handles advanced cases (custom serialization)
- If both are present, the protocol takes precedence

This gives users a progressive path: start with the flag for zero-effort DB compat, graduate to the protocol for custom needs.

## What we'd like feedback on

1. **Which approach** feels most natural for your use case?
2. **Is the leaf-only flag sufficient** for most persistence needs, or do you need the flexibility of a protocol/codec?
3. **For the protocol approach**: what information would your model need from the library to make good serialization decisions? Is `states_map` enough, or would you need more context?
4. **Are there persistence patterns** we haven't considered? (Event sourcing, CQRS, multi-table, etc.)

Any feedback is welcome — especially from users who are persisting state to databases in production.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Improve persistence ergonomics for compound/parallel state machines #597

Problem

Why parent states are redundant

Current workaround

Design constraints

Possible approaches

Approach 1: Class-level opt-in flag

Approach 2: Model Protocol (serialize/deserialize delegation)

Approach 3: Codec/transform parameter

Approach 4: Combine flag + protocol

What we'd like feedback on

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Topology	`model.state` value
Flat SM	`"draft"` (scalar string ✓)
Compound SM	`OrderedSet(["idle", "sub_a"])`
Parallel SM	`OrderedSet(["war", "region_a", "a1", "region_b", "b1"])`

Uh oh!

RFC: Improve persistence ergonomics for compound/parallel state machines #597

Description

Problem

Why parent states are redundant

Current workaround

Design constraints

Possible approaches

Approach 1: Class-level opt-in flag

Approach 2: Model Protocol (serialize/deserialize delegation)

Approach 3: Codec/transform parameter

Approach 4: Combine flag + protocol

What we'd like feedback on

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions