The flight recorder for AI — record every decision, replay every incident, enforce every policy.
AI agents are making real decisions — calling APIs, executing code, moving money, accessing databases. But there is no standard infrastructure for:
- Auditing what an agent actually did
- Enforcing policies before an action executes
- Shutting down a runaway agent in real time
- Replaying an incident after something goes wrong
- Redacting secrets before they hit your logging stack
Every team re-invents this differently. Secrets leak. Budgets burn. Regulators ask questions nobody can answer.
AIR Blackbox is the missing layer between your AI agents and your infrastructure.
Your Agent ──→ Gateway ──→ Policy Engine ──→ LLM Provider
│ │
▼ ▼
OTel Collector Kill Switches
│ Trust Scoring
▼ Risk Tiers
Episode Store
Jaeger · Prometheus
One line change — swap your base_url — and every agent call flows through AIR Blackbox automatically. No SDK changes, no code refactoring.
git clone https://github.com/airblackbox/air-platform.git && cd air-platform
cp .env.example .env # add your OPENAI_API_KEY
make up # 6 services running in ~8 secondsThen point any OpenAI-compatible client at localhost:8080. That's it.
- Traces →
localhost:16686(Jaeger) - Metrics →
localhost:9091(Prometheus) - Episodes →
localhost:8081(Episode Store API)
Explore each component without installing anything:
| Component | Try It |
|---|---|
| Platform Orchestration | Launch Demo → |
| Policy Engine | Launch Demo → |
| Episode Store | Launch Demo → |
| Gateway | Launch Demo → |
| OTel Collector | Launch Demo → |
| Repo | What It Does |
|---|---|
| air-platform | Full stack in one command — Docker Compose orchestration |
| gateway | OpenAI-compatible reverse proxy — traces every LLM call |
| agent-episode-store | Groups traces into replayable task-level episodes |
| agent-policy-engine | Risk tiers, kill switches, trust scoring |
| Repo | What It Does |
|---|---|
| otel-collector-genai | PII redaction, cost metrics, loop detection |
| otel-prompt-vault | Encrypted prompt storage with pre-signed URL retrieval |
| otel-semantic-normalizer | Normalizes gen_ai.* attributes to a standard schema |
| agent-tool-sandbox | Sandboxed execution for agent tool calls |
| runtime-aibom-emitter | AI Bill of Materials generation at runtime |
| Repo | What It Does |
|---|---|
| python-sdk | Python SDK — wraps OpenAI, Anthropic, and other LLM clients |
| trust-crewai | Trust plugin for CrewAI |
| trust-langchain | Trust plugin for LangChain / LangGraph |
| trust-autogen | Trust plugin for Microsoft AutoGen |
| trust-openai-agents | Trust plugin for OpenAI Agents SDK |
| Repo | What It Does |
|---|---|
| eval-harness | Replay and score episodes against policies |
| trace-regression-harness | Detect behavioral regressions across agent versions |
| agent-vcr | Record and replay agent interactions for testing |
| mcp-security-scanner | Scan MCP server configs for vulnerabilities |
| mcp-policy-gateway | Policy enforcement for Model Context Protocol |
The same reason you don't implement TLS differently in every microservice.
Agent safety needs to be a standardized layer, not something each team builds ad hoc. AIR Blackbox operates in the OTel pipeline, as a reverse proxy, and as a policy engine — so it works across any framework, any model, any deployment.
We're looking for contributors interested in AI safety, observability, and governance. See our Contributing Guide to get started.
Current priorities:
- New framework connectors (Haystack, DSPy, Semantic Kernel)
- Policy templates for common compliance scenarios
- Documentation and integration examples
Apache 2.0 · Built on OpenTelemetry · 21 repositories
