Skip to content

rohithputha/AgCompress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agcompress-core

Framework-agnostic async memory compression engine for long-running LLM chats and autonomous agent loops.

agcompress-core keeps request-time context bounded, keeps latest truth ahead of stale history, and makes memory state durable across crashes/restarts when SQLite persistence is enabled.

Table of Contents

Why This Exists

Long-running conversations and agent runs degrade without memory controls:

  • context windows overflow;
  • stale facts can reappear;
  • compression latency can block throughput;
  • restart events can break revision/watermark integrity.

agcompress-core addresses this with revisioned memory state, bounded raw context windows, deterministic supersession rules, progressive compression, and optional crash-safe SQLite persistence.

Feature Highlights

  • Async compression off the send path via ThreadPoolExecutor.
  • Deterministic context precedence:
    • raw sliding window (newest-first selection)
    • active slot truth
    • compressed summary
  • Temporal correctness behavior:
    • latest correction wins
    • older values remain auditable but not active truth
  • Progressive compression with token-budget-aware chunk planning.
  • Branch isolation and fork/copy-on-write semantics.
  • Backoff, retry spacing, and cooldown guards for invalid output storms.
  • Optional SQLite durability for turns, slots, logs, summaries, runs, and metadata.
  • Startup reconciliation and auto-rerun scheduling for dirty branches.
  • End-to-end 3B benchmark harness with release gates for latency/quality.

Design Principles

  • Temporal truth over static summaries: newer user corrections must override older compressed artifacts.
  • Bounded context over unbounded recall: raw window and compression budgets are hard-capped.
  • Deterministic merges over fuzzy state: revision checks and patch acceptance gates protect invariants.
  • Branch-local correctness over global convenience: state, retries, and leases are branch-scoped.
  • Recoverability over happy-path speed: persistent metadata enables restart-safe continuation.

Installation

Requirements:

  • Python >=3.10
  • Optional: local Ollama server for model-backed compression and benchmark runs

Install package:

python -m pip install -e .

Install development extras:

python -m pip install -e '.[dev]'

Quickstart

from agcompress_core import MemoryConfig, MemoryEngine

cfg = MemoryConfig(
    model_context_window=128000,
    ollama_model="qwen2.5:3b-instruct",
    enable_async_compression=True,
    persistence_db_path="./agcompress.sqlite3",  # optional
)

engine = MemoryEngine(cfg)

engine.add_turn("main", "user", "Budget is $20,000")
engine.add_turn("main", "assistant", "Acknowledged.")
engine.add_turn("main", "user", "Correction: budget is $25,000")

ctx = engine.get_context("main", "budget and planning")
print(ctx.system_appendix)

engine.close()

Integration Pattern (Host App)

Typical app loop:

  1. engine.add_turn(branch_id, "user", user_text)
  2. ctx = engine.get_context(branch_id, current_query)
  3. host model call with ctx.messages + ctx.system_appendix
  4. engine.on_assistant_final(branch_id, assistant_text, current_query)

on_assistant_final(...) appends assistant output and schedules compression asynchronously.

Architecture

Core runtime paths:

High-level flow:

add_turn -> mark branch dirty -> maybe_compress_async
        -> prepare_run (plan chunk, lock run) -> model compression patch
        -> complete_run (accept/reject + merge + watermark update)
        -> get_context assembles prompt appendix (raw > slots > summary)

Context Assembly and Precedence

Send-time appendix precedence is fixed:

  1. Sliding raw window (highest priority, newest-first selected under token cap)
  2. Active slot truth
  3. Compressed summary (lowest priority)

Conflict behavior:

  • latest raw/slot values win;
  • superseded older values are retained for audit but not exposed as active truth.

This prevents stale override regressions under tight budgets.

Compression Lifecycle

  1. Trigger policy computes token ratio against effective context window.
  2. prepare_run(...) selects older turns using turn-count plus token budgets.
  3. Single-pass compression returns structured patch (facts, fact_candidates, summary, compressed_turn_ids).
  4. Patch acceptance gate validates branch/shape/coverage constraints.
  5. complete_run(...) merges accepted updates and advances watermark only by accepted coverage.
  6. Failures use exponential backoff + jitter, branch retry interval, and cooldown pause.

Persistence and Recovery

Enable with MemoryConfig.persistence_db_path.

Persisted entities include:

  • branches + branch metadata
  • conversation turns
  • slot state
  • memory facts
  • evidence log / change log
  • summaries
  • compression runs + leases
  • dynamic attribute alias decisions/events

Startup behavior:

  • reconciles expired/stale leases;
  • repairs dirty/in_flight metadata consistency;
  • optionally auto-queues reruns for dirty branches.

SQLite runtime settings:

  • journal_mode=WAL
  • synchronous=NORMAL
  • foreign_keys=ON
  • configurable busy_timeout

Configuration Guide

Primary knobs in src/agcompress_core/config.py:

  • Context + triggers:
    • model_context_window
    • light_trigger_ratio, aggressive_trigger_ratio
    • recent_turns_keep
  • Compression chunking:
    • light_old_turns, max_old_turns_aggressive
    • light_old_turn_token_budget, aggressive_old_turn_token_budget
  • Raw window:
    • raw_window_ratio
    • raw_window_token_limit
  • Retry/cooldown:
    • retry_backoff_base_ms, retry_backoff_max_ms, retry_jitter_ratio
    • retry_min_interval_ms, max_consecutive_failures, failure_cooldown_ms
  • Mapping/supersession:
    • attribute_similarity_threshold, attribute_similarity_margin
  • Retention controls:
    • max_summaries_retained, max_raw_facts_retained
  • Persistence:
    • persistence_db_path
    • persistence_lease_ttl_ms, persistence_busy_timeout_ms, persistence_lease_owner
    • enable_startup_auto_rerun

Running Benchmarks (3B End-to-End)

Run benchmark suite:

python tools/benchmark_3b.py \
  --model qwen2.5:3b-instruct \
  --iterations 2 \
  --warmup 1

Benchmark gates:

  • latency: p95 <= max_p95_latency_ms
  • structured output quality: json_validity_rate >= min_json_validity_rate
  • temporal quality: stale_truth_rate <= max_stale_truth_rate

Latest benchmark report path:

  • benchmark_results/benchmark_3b_latest.json

Project Layout

src/agcompress_core/
  engine.py         # public runtime API
  state.py          # revisions, run planning, merge/watermark logic
  assembler.py      # prompt-context construction + precedence
  compressor.py     # LLM patch generation and parse path
  resolver.py       # slot update/supersession/canonicalization
  persistence.py    # SQLite persistence, leases, startup reconciliation
  config.py         # runtime configuration
  types.py          # datamodels

tests/
  test_*.py         # behavior, regression, and fault-injection coverage

tools/
  benchmark_3b.py   # E2E benchmark and release gates

Known Constraints

  • Compression quality and latency depend on model behavior and hardware.
  • If model_context_window exceeds serving ollama_num_ctx, effective budget is capped to serving context.
  • SQLite persistence is single-node/local; distributed coordination is out of scope.

Development

Run tests:

pytest -q

Run type checks:

python -m mypy src

Contributing

Recommended workflow:

  1. Open an issue describing risk/regression/bug.
  2. Add a failing test first where practical.
  3. Implement fix while preserving revision and temporal invariants.
  4. Run pytest -q and python -m mypy src.

License

MIT. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages