Framework-agnostic async memory compression engine for long-running LLM chats and autonomous agent loops.
agcompress-core keeps request-time context bounded, keeps latest truth ahead of stale history, and makes memory state durable across crashes/restarts when SQLite persistence is enabled.
- Why This Exists
- Feature Highlights
- Design Principles
- Installation
- Quickstart
- Integration Pattern (Host App)
- Architecture
- Context Assembly and Precedence
- Compression Lifecycle
- Persistence and Recovery
- Configuration Guide
- Running Benchmarks (3B End-to-End)
- Project Layout
- Known Constraints
- Development
- Contributing
- License
Long-running conversations and agent runs degrade without memory controls:
- context windows overflow;
- stale facts can reappear;
- compression latency can block throughput;
- restart events can break revision/watermark integrity.
agcompress-core addresses this with revisioned memory state, bounded raw context windows, deterministic supersession rules, progressive compression, and optional crash-safe SQLite persistence.
- Async compression off the send path via
ThreadPoolExecutor. - Deterministic context precedence:
- raw sliding window (newest-first selection)
- active slot truth
- compressed summary
- Temporal correctness behavior:
- latest correction wins
- older values remain auditable but not active truth
- Progressive compression with token-budget-aware chunk planning.
- Branch isolation and fork/copy-on-write semantics.
- Backoff, retry spacing, and cooldown guards for invalid output storms.
- Optional SQLite durability for turns, slots, logs, summaries, runs, and metadata.
- Startup reconciliation and auto-rerun scheduling for dirty branches.
- End-to-end 3B benchmark harness with release gates for latency/quality.
- Temporal truth over static summaries: newer user corrections must override older compressed artifacts.
- Bounded context over unbounded recall: raw window and compression budgets are hard-capped.
- Deterministic merges over fuzzy state: revision checks and patch acceptance gates protect invariants.
- Branch-local correctness over global convenience: state, retries, and leases are branch-scoped.
- Recoverability over happy-path speed: persistent metadata enables restart-safe continuation.
Requirements:
- Python
>=3.10 - Optional: local Ollama server for model-backed compression and benchmark runs
Install package:
python -m pip install -e .Install development extras:
python -m pip install -e '.[dev]'from agcompress_core import MemoryConfig, MemoryEngine
cfg = MemoryConfig(
model_context_window=128000,
ollama_model="qwen2.5:3b-instruct",
enable_async_compression=True,
persistence_db_path="./agcompress.sqlite3", # optional
)
engine = MemoryEngine(cfg)
engine.add_turn("main", "user", "Budget is $20,000")
engine.add_turn("main", "assistant", "Acknowledged.")
engine.add_turn("main", "user", "Correction: budget is $25,000")
ctx = engine.get_context("main", "budget and planning")
print(ctx.system_appendix)
engine.close()Typical app loop:
engine.add_turn(branch_id, "user", user_text)ctx = engine.get_context(branch_id, current_query)- host model call with
ctx.messages+ctx.system_appendix engine.on_assistant_final(branch_id, assistant_text, current_query)
on_assistant_final(...) appends assistant output and schedules compression asynchronously.
Core runtime paths:
src/agcompress_core/engine.py: public API and orchestrationsrc/agcompress_core/state.py: branch state, revisions, run lifecycle, merge rulessrc/agcompress_core/compressor.py: prompt construction + JSON patch extractionsrc/agcompress_core/resolver.py: slot supersession/canonicalization logicsrc/agcompress_core/assembler.py: context assembly with precedence and capssrc/agcompress_core/persistence.py: SQLite schema, commits, leases, reconciliationsrc/agcompress_core/config.py: tuning knobs
High-level flow:
add_turn -> mark branch dirty -> maybe_compress_async
-> prepare_run (plan chunk, lock run) -> model compression patch
-> complete_run (accept/reject + merge + watermark update)
-> get_context assembles prompt appendix (raw > slots > summary)
Send-time appendix precedence is fixed:
- Sliding raw window (highest priority, newest-first selected under token cap)
- Active slot truth
- Compressed summary (lowest priority)
Conflict behavior:
- latest raw/slot values win;
- superseded older values are retained for audit but not exposed as active truth.
This prevents stale override regressions under tight budgets.
- Trigger policy computes token ratio against effective context window.
prepare_run(...)selects older turns using turn-count plus token budgets.- Single-pass compression returns structured patch (
facts,fact_candidates,summary,compressed_turn_ids). - Patch acceptance gate validates branch/shape/coverage constraints.
complete_run(...)merges accepted updates and advances watermark only by accepted coverage.- Failures use exponential backoff + jitter, branch retry interval, and cooldown pause.
Enable with MemoryConfig.persistence_db_path.
Persisted entities include:
- branches + branch metadata
- conversation turns
- slot state
- memory facts
- evidence log / change log
- summaries
- compression runs + leases
- dynamic attribute alias decisions/events
Startup behavior:
- reconciles expired/stale leases;
- repairs
dirty/in_flightmetadata consistency; - optionally auto-queues reruns for dirty branches.
SQLite runtime settings:
journal_mode=WALsynchronous=NORMALforeign_keys=ON- configurable
busy_timeout
Primary knobs in src/agcompress_core/config.py:
- Context + triggers:
model_context_windowlight_trigger_ratio,aggressive_trigger_ratiorecent_turns_keep
- Compression chunking:
light_old_turns,max_old_turns_aggressivelight_old_turn_token_budget,aggressive_old_turn_token_budget
- Raw window:
raw_window_ratioraw_window_token_limit
- Retry/cooldown:
retry_backoff_base_ms,retry_backoff_max_ms,retry_jitter_ratioretry_min_interval_ms,max_consecutive_failures,failure_cooldown_ms
- Mapping/supersession:
attribute_similarity_threshold,attribute_similarity_margin
- Retention controls:
max_summaries_retained,max_raw_facts_retained
- Persistence:
persistence_db_pathpersistence_lease_ttl_ms,persistence_busy_timeout_ms,persistence_lease_ownerenable_startup_auto_rerun
Run benchmark suite:
python tools/benchmark_3b.py \
--model qwen2.5:3b-instruct \
--iterations 2 \
--warmup 1Benchmark gates:
- latency:
p95 <= max_p95_latency_ms - structured output quality:
json_validity_rate >= min_json_validity_rate - temporal quality:
stale_truth_rate <= max_stale_truth_rate
Latest benchmark report path:
benchmark_results/benchmark_3b_latest.json
src/agcompress_core/
engine.py # public runtime API
state.py # revisions, run planning, merge/watermark logic
assembler.py # prompt-context construction + precedence
compressor.py # LLM patch generation and parse path
resolver.py # slot update/supersession/canonicalization
persistence.py # SQLite persistence, leases, startup reconciliation
config.py # runtime configuration
types.py # datamodels
tests/
test_*.py # behavior, regression, and fault-injection coverage
tools/
benchmark_3b.py # E2E benchmark and release gates
- Compression quality and latency depend on model behavior and hardware.
- If
model_context_windowexceeds servingollama_num_ctx, effective budget is capped to serving context. - SQLite persistence is single-node/local; distributed coordination is out of scope.
Run tests:
pytest -qRun type checks:
python -m mypy srcRecommended workflow:
- Open an issue describing risk/regression/bug.
- Add a failing test first where practical.
- Implement fix while preserving revision and temporal invariants.
- Run
pytest -qandpython -m mypy src.
MIT. See LICENSE.