AIR Blackbox

Open-source infrastructure for safe deployment of autonomous AI agents.

The flight recorder for AI — record every decision, replay every incident, enforce every policy.

The Problem

AI agents are making real decisions — calling APIs, executing code, moving money, accessing databases. But there is no standard infrastructure for:

Auditing what an agent actually did
Enforcing policies before an action executes
Shutting down a runaway agent in real time
Replaying an incident after something goes wrong
Redacting secrets before they hit your logging stack

Every team re-invents this differently. Secrets leak. Budgets burn. Regulators ask questions nobody can answer.

AIR Blackbox is the missing layer between your AI agents and your infrastructure.

How It Works

Your Agent ──→ Gateway ──→ Policy Engine ──→ LLM Provider
                 │               │
                 ▼               ▼
           OTel Collector   Kill Switches
                 │          Trust Scoring
                 ▼          Risk Tiers
           Episode Store
           Jaeger · Prometheus

One line change — swap your base_url — and every agent call flows through AIR Blackbox automatically. No SDK changes, no code refactoring.

5-Minute Quickstart

git clone https://github.com/airblackbox/air-platform.git && cd air-platform
cp .env.example .env          # add your OPENAI_API_KEY
make up                       # 6 services running in ~8 seconds

Then point any OpenAI-compatible client at localhost:8080. That's it.

Traces → localhost:16686 (Jaeger)
Metrics → localhost:9091 (Prometheus)
Episodes → localhost:8081 (Episode Store API)

Interactive Demos

Explore each component without installing anything:

Component	Try It
Platform Orchestration	Launch Demo →
Policy Engine	Launch Demo →
Episode Store	Launch Demo →
Gateway	Launch Demo →
OTel Collector	Launch Demo →

Repositories

Core Runtime

Repo	What It Does
air-platform	Full stack in one command — Docker Compose orchestration
gateway	OpenAI-compatible reverse proxy — traces every LLM call
agent-episode-store	Groups traces into replayable task-level episodes
agent-policy-engine	Risk tiers, kill switches, trust scoring

Safety & Governance

Repo	What It Does
otel-collector-genai	PII redaction, cost metrics, loop detection
otel-prompt-vault	Encrypted prompt storage with pre-signed URL retrieval
otel-semantic-normalizer	Normalizes gen_ai.* attributes to a standard schema
agent-tool-sandbox	Sandboxed execution for agent tool calls
runtime-aibom-emitter	AI Bill of Materials generation at runtime

Instrumentation

Repo	What It Does
python-sdk	Python SDK — wraps OpenAI, Anthropic, and other LLM clients
trust-crewai	Trust plugin for CrewAI
trust-langchain	Trust plugin for LangChain / LangGraph
trust-autogen	Trust plugin for Microsoft AutoGen
trust-openai-agents	Trust plugin for OpenAI Agents SDK

Evaluation & Security

Repo	What It Does
eval-harness	Replay and score episodes against policies
trace-regression-harness	Detect behavioral regressions across agent versions
agent-vcr	Record and replay agent interactions for testing
mcp-security-scanner	Scan MCP server configs for vulnerabilities
mcp-policy-gateway	Policy enforcement for Model Context Protocol

Why Infrastructure-Level?

The same reason you don't implement TLS differently in every microservice.

Agent safety needs to be a standardized layer, not something each team builds ad hoc. AIR Blackbox operates in the OTel pipeline, as a reverse proxy, and as a policy engine — so it works across any framework, any model, any deployment.

Contributing

We're looking for contributors interested in AI safety, observability, and governance. See our Contributing Guide to get started.

Current priorities:

New framework connectors (Haystack, DSPy, Semantic Kernel)
Policy templates for common compliance scenarios
Documentation and integration examples

Apache 2.0 · Built on OpenTelemetry · 21 repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIR Blackbox

AIR Blackbox

Open-source infrastructure for safe deployment of autonomous AI agents.

The Problem

How It Works

5-Minute Quickstart

Interactive Demos

Repositories

Core Runtime

Safety & Governance

Instrumentation

Evaluation & Security

Why Infrastructure-Level?

Contributing

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!