agcompress-core

Framework-agnostic async memory compression engine for long-running LLM chats and autonomous agent loops.

agcompress-core keeps request-time context bounded, keeps latest truth ahead of stale history, and makes memory state durable across crashes/restarts when SQLite persistence is enabled.

Why This Exists
Feature Highlights
Design Principles
Installation
Quickstart
Integration Pattern (Host App)
Architecture
Context Assembly and Precedence
Compression Lifecycle
Persistence and Recovery
Configuration Guide
Running Benchmarks (3B End-to-End)
Project Layout
Known Constraints
Development
Contributing
License

Why This Exists

Long-running conversations and agent runs degrade without memory controls:

context windows overflow;
stale facts can reappear;
compression latency can block throughput;
restart events can break revision/watermark integrity.

agcompress-core addresses this with revisioned memory state, bounded raw context windows, deterministic supersession rules, progressive compression, and optional crash-safe SQLite persistence.

Feature Highlights

Async compression off the send path via ThreadPoolExecutor.
Deterministic context precedence:
- raw sliding window (newest-first selection)
- active slot truth
- compressed summary
Temporal correctness behavior:
- latest correction wins
- older values remain auditable but not active truth
Progressive compression with token-budget-aware chunk planning.
Branch isolation and fork/copy-on-write semantics.
Backoff, retry spacing, and cooldown guards for invalid output storms.
Optional SQLite durability for turns, slots, logs, summaries, runs, and metadata.
Startup reconciliation and auto-rerun scheduling for dirty branches.
End-to-end 3B benchmark harness with release gates for latency/quality.

Design Principles

Temporal truth over static summaries: newer user corrections must override older compressed artifacts.
Bounded context over unbounded recall: raw window and compression budgets are hard-capped.
Deterministic merges over fuzzy state: revision checks and patch acceptance gates protect invariants.
Branch-local correctness over global convenience: state, retries, and leases are branch-scoped.
Recoverability over happy-path speed: persistent metadata enables restart-safe continuation.

Installation

Requirements:

Python >=3.10
Optional: local Ollama server for model-backed compression and benchmark runs

Install package:

python -m pip install -e .

Install development extras:

python -m pip install -e '.[dev]'

Quickstart

from agcompress_core import MemoryConfig, MemoryEngine

cfg = MemoryConfig(
    model_context_window=128000,
    ollama_model="qwen2.5:3b-instruct",
    enable_async_compression=True,
    persistence_db_path="./agcompress.sqlite3",  # optional
)

engine = MemoryEngine(cfg)

engine.add_turn("main", "user", "Budget is $20,000")
engine.add_turn("main", "assistant", "Acknowledged.")
engine.add_turn("main", "user", "Correction: budget is $25,000")

ctx = engine.get_context("main", "budget and planning")
print(ctx.system_appendix)

engine.close()

Integration Pattern (Host App)

Typical app loop:

engine.add_turn(branch_id, "user", user_text)
ctx = engine.get_context(branch_id, current_query)
host model call with ctx.messages + ctx.system_appendix
engine.on_assistant_final(branch_id, assistant_text, current_query)

on_assistant_final(...) appends assistant output and schedules compression asynchronously.

Architecture

Core runtime paths:

src/agcompress_core/engine.py: public API and orchestration
src/agcompress_core/state.py: branch state, revisions, run lifecycle, merge rules
src/agcompress_core/compressor.py: prompt construction + JSON patch extraction
src/agcompress_core/resolver.py: slot supersession/canonicalization logic
src/agcompress_core/assembler.py: context assembly with precedence and caps
src/agcompress_core/persistence.py: SQLite schema, commits, leases, reconciliation
src/agcompress_core/config.py: tuning knobs

High-level flow:

add_turn -> mark branch dirty -> maybe_compress_async
        -> prepare_run (plan chunk, lock run) -> model compression patch
        -> complete_run (accept/reject + merge + watermark update)
        -> get_context assembles prompt appendix (raw > slots > summary)

Context Assembly and Precedence

Send-time appendix precedence is fixed:

Sliding raw window (highest priority, newest-first selected under token cap)
Active slot truth
Compressed summary (lowest priority)

Conflict behavior:

latest raw/slot values win;
superseded older values are retained for audit but not exposed as active truth.

This prevents stale override regressions under tight budgets.

Compression Lifecycle

Trigger policy computes token ratio against effective context window.
prepare_run(...) selects older turns using turn-count plus token budgets.
Single-pass compression returns structured patch (facts, fact_candidates, summary, compressed_turn_ids).
Patch acceptance gate validates branch/shape/coverage constraints.
complete_run(...) merges accepted updates and advances watermark only by accepted coverage.
Failures use exponential backoff + jitter, branch retry interval, and cooldown pause.

Persistence and Recovery

Enable with MemoryConfig.persistence_db_path.

Persisted entities include:

branches + branch metadata
conversation turns
slot state
memory facts
evidence log / change log
summaries
compression runs + leases
dynamic attribute alias decisions/events

Startup behavior:

reconciles expired/stale leases;
repairs dirty/in_flight metadata consistency;
optionally auto-queues reruns for dirty branches.

SQLite runtime settings:

journal_mode=WAL
synchronous=NORMAL
foreign_keys=ON
configurable busy_timeout

Configuration Guide

Primary knobs in src/agcompress_core/config.py:

Context + triggers:
- model_context_window
- light_trigger_ratio, aggressive_trigger_ratio
- recent_turns_keep
Compression chunking:
- light_old_turns, max_old_turns_aggressive
- light_old_turn_token_budget, aggressive_old_turn_token_budget
Raw window:
- raw_window_ratio
- raw_window_token_limit
Retry/cooldown:
- retry_backoff_base_ms, retry_backoff_max_ms, retry_jitter_ratio
- retry_min_interval_ms, max_consecutive_failures, failure_cooldown_ms
Mapping/supersession:
- attribute_similarity_threshold, attribute_similarity_margin
Retention controls:
- max_summaries_retained, max_raw_facts_retained
Persistence:
- persistence_db_path
- persistence_lease_ttl_ms, persistence_busy_timeout_ms, persistence_lease_owner
- enable_startup_auto_rerun

Running Benchmarks (3B End-to-End)

Run benchmark suite:

python tools/benchmark_3b.py \
  --model qwen2.5:3b-instruct \
  --iterations 2 \
  --warmup 1

Benchmark gates:

latency: p95 <= max_p95_latency_ms
structured output quality: json_validity_rate >= min_json_validity_rate
temporal quality: stale_truth_rate <= max_stale_truth_rate

Latest benchmark report path:

benchmark_results/benchmark_3b_latest.json

Project Layout

src/agcompress_core/
  engine.py         # public runtime API
  state.py          # revisions, run planning, merge/watermark logic
  assembler.py      # prompt-context construction + precedence
  compressor.py     # LLM patch generation and parse path
  resolver.py       # slot update/supersession/canonicalization
  persistence.py    # SQLite persistence, leases, startup reconciliation
  config.py         # runtime configuration
  types.py          # datamodels

tests/
  test_*.py         # behavior, regression, and fault-injection coverage

tools/
  benchmark_3b.py   # E2E benchmark and release gates

Known Constraints

Compression quality and latency depend on model behavior and hardware.
If model_context_window exceeds serving ollama_num_ctx, effective budget is capped to serving context.
SQLite persistence is single-node/local; distributed coordination is out of scope.

Development

Run tests:

pytest -q

Run type checks:

python -m mypy src

Contributing

Recommended workflow:

Open an issue describing risk/regression/bug.
Add a failing test first where practical.
Implement fix while preserving revision and temporal invariants.
Run pytest -q and python -m mypy src.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmark_results		benchmark_results
plans		plans
src/agcompress_core		src/agcompress_core
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agcompress-core

Table of Contents

Why This Exists

Feature Highlights

Design Principles

Installation

Quickstart

Integration Pattern (Host App)

Architecture

Context Assembly and Precedence

Compression Lifecycle

Persistence and Recovery

Configuration Guide

Running Benchmarks (3B End-to-End)

Project Layout

Known Constraints

Development

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agcompress-core

Table of Contents

Why This Exists

Feature Highlights

Design Principles

Installation

Quickstart

Integration Pattern (Host App)

Architecture

Context Assembly and Precedence

Compression Lifecycle

Persistence and Recovery

Configuration Guide

Running Benchmarks (3B End-to-End)

Project Layout

Known Constraints

Development

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages