-
Notifications
You must be signed in to change notification settings - Fork 120
Open
Description
Problem
The agent backend runs on a single uvicorn worker process with an in-memory checkpointer on a 512MB Render starter instance. This is a global bottleneck — not per-user. All concurrent users share the same event loop, the same memory pool, and the same 200-thread checkpoint limit.
Currently the app breaks at ~3 concurrent connections (#63). At production scale (100-1000 users), it would be effectively unusable.
Architecture bottlenecks
1. Single worker process
- uvicorn runs with 1 worker (default) — all requests share one Python event loop
- Each GPT-5.4 visualization call takes 10-30s
- LangGraph has synchronous sections that block the event loop
- Throughput: ~2-6 visualization requests/minute
2. In-memory checkpointer (BoundedMemorySaver)
- All conversation state stored in RAM — shared global pool of 200 threads
- FIFO eviction: after 200 conversations across ALL users, oldest threads are silently deleted
- Users lose conversation context mid-session with no error
- Not thread-safe — designed for single-process async only
- On 512MB starter plan, memory pressure builds well before 200 threads
3. No backpressure or error surfacing
- When the backend is saturated, requests hang silently — no timeout, no error, no retry
- Frontend shows no indication that the agent is overloaded
- Health check at
/healthreturns 200 even when the event loop is blocked
Scale projections
| Concurrent users | Behavior |
|---|---|
| 1-5 | Works fine |
| 10-20 | Noticeable latency, requests queue |
| 50+ | Requests timeout, SSE connections drop |
| 100+ | Effectively down, health checks fail, Render restarts |
Proposed solution
Phase 1 — Quick wins (config changes only)
- Add
--workers 4to uvicorn startCommand inrender.yaml— multiplies throughput ~4x - Upgrade agent service from starter (512MB) to standard (1GB+) in
render.yaml - Enable rate limiting (
RATE_LIMIT_ENABLED=true) with reasonable limits (e.g. 20 req/min per IP)
Phase 2 — Persistent checkpointer
- Replace
BoundedMemorySaverwith PostgreSQL or SQLite async checkpointer - Conversation state survives restarts and doesn't consume RAM
- No more silent thread eviction — threads persist until explicitly cleaned up
- Render already supports managed Postgres — can add as a service in
render.yaml
Phase 3 — Error handling and backpressure
- Add frontend timeout — show error after ~30s of no response instead of hanging forever
- Add backend concurrency limit — return 503 "busy" when at capacity rather than queuing indefinitely
- Add connection health monitoring — detect dropped SSE connections and surface to user
- Reuse thread IDs per browser tab (
sessionStorage) to avoid creating unnecessary threads
Phase 4 — Horizontal scaling
- Use Gunicorn with uvicorn workers for proper process management
- Verify Render auto-scaling (1-3 instances) works correctly with persistent checkpointer
- Add Redis or Postgres for shared state across instances
- Load test at target concurrency (100+ users) to validate
Related issues
- bug: agent stops responding after multiple concurrent tabs #63 — Agent stops responding after multiple concurrent tabs (symptom of this)
- Quality regression: generated visualizations are subpar and broken #58 — Quality regression (long-running visualization calls exacerbate the single-worker bottleneck)
- feat: add planning step before visualization generation #62 — Planning step before visualization (adds an extra round-trip, making concurrency even more critical)
Key files
apps/agent/main.py— uvicorn config, BoundedMemorySaver(max_threads=200)apps/agent/src/bounded_memory_saver.py— FIFO eviction logicrender.yaml— Render service config (starter plan, no worker config)apps/app/src/app/api/copilotkit/route.ts— Frontend → agent connection
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels