OpenAI-compatible API server wrapping the Gemini CLI
| Feature | Description |
|---|---|
| OpenAI-compatible API | Works with any client that speaks the OpenAI chat completions format |
| Drop-in Ollama replacement | Change your base URL and you're done |
| True streaming | Real-time SSE streaming β tokens are emitted as the CLI generates them |
| Multiple models | Supports all Gemini models available through the CLI |
| Working directory control | Point the CLI at any directory for context-aware code exploration |
| Large prompt handling | Prompts over 100 KB are automatically piped via stdin |
| Multimodal support | Base64-encoded images via OpenAI's vision format |
| Privacy-first logging | User prompts are never logged |
| Zero configuration | Sensible defaults, everything configurable via env vars |
Tip
For image analysis tasks, use gemini-2.5-pro or higher. gemini-2.5-flash may hallucinate that it cannot read image files through the CLI.
- Python 3.10+
- Gemini CLI installed and authenticated
# Install
pip install -e .
# Or with dev dependencies
pip install -e ".[dev]"# Default: http://0.0.0.0:11435
python -m gemini_cli_server
# Or use the CLI command
gemini-cli-server# List models
curl http://localhost:11435/v1/models
# Chat completion
curl -X POST http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# With a working directory (for code exploration)
curl -X POST http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Explain the architecture of this project"}],
"working_dir": "/path/to/your/project"
}'
# Streaming
curl -N -X POST http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Write a poem"}],
"stream": true
}'# Before (Ollama)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")
# After (Gemini CLI Server) β just change the port
client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)import litellm
response = litellm.completion(
model="openai/gemini-2.5-pro",
messages=[{"role": "user", "content": "Hello!"}],
api_base="http://localhost:11435/v1",
api_key="unused",
)| Method | Path | Description |
|---|---|---|
GET |
/ |
Server info |
GET |
/health |
Health check (verifies CLI is responsive) |
GET |
/v1/models |
List available models |
GET |
/v1/models/{id} |
Get model details |
POST |
/v1/chat/completions |
Create chat completion |
The
working_dirfield sets the working directory for the CLI subprocess β enabling context-aware responses for code exploration. Also accepted via theX-Working-DirHTTP header.
| Model ID | Description |
|---|---|
gemini-3-pro-preview |
Gemini 3 Pro Preview |
gemini-3-flash-preview |
Gemini 3 Flash Preview |
gemini-2.5-pro |
Gemini 2.5 Pro (default) |
gemini-2.5-flash |
Gemini 2.5 Flash |
gemini-2.5-flash-lite |
Gemini 2.5 Flash Lite |
Any model identifier accepted by the Gemini CLI can be used β the server passes it through directly.
All settings are configurable via environment variables:
| Variable | Default | Description |
|---|---|---|
GEMINI_CLI_HOST |
0.0.0.0 |
Server bind address |
GEMINI_CLI_PORT |
11435 |
Server port |
GEMINI_CLI_DEFAULT_MODEL |
gemini-2.5-pro |
Default model |
GEMINI_CLI_COMMAND |
gemini |
Path to the Gemini CLI binary |
GEMINI_CLI_WORKING_DIR |
(cwd) | Default working directory |
GEMINI_CLI_TIMEOUT |
300 |
CLI timeout in seconds |
GEMINI_CLI_MAX_RETRIES |
2 |
Max retries on transient errors |
GEMINI_CLI_LOG_LEVEL |
info |
Log level |
Working directory access
The working_dir parameter is passed directly to the CLI subprocess β there is no path validation or sandboxing. Any accessible path on the server can be specified.
This is by design for local and trusted-network use. If exposing to untrusted clients:
- Run the server in a container or restricted environment
- Add a reverse proxy with path validation
- Implement an allowlist of permitted directories
Prompt size limits
- Prompts under 100 KB β passed as CLI arguments (fast, simple)
- Prompts over 100 KB β automatically piped via stdin (avoids
ARG_MAXlimits)
Privacy
User prompts are never logged. Only metadata is recorded (model name, working directory, prompt size, timing).
make install-dev # Install dev dependencies
make test # Run unit tests
make test-cov # Run with coverage
make test-e2e # Run E2E tests (requires gemini CLI)
make lint # Lint
make format # Formatgemini-cli-server/
βββ gemini_cli_server/
β βββ __init__.py # Package version
β βββ __main__.py # CLI entry point
β βββ api_types.py # OpenAI-compatible Pydantic models
β βββ cli_runner.py # Subprocess executor + streaming
β βββ config.py # Configuration with env overrides
β βββ models.py # Model registry
β βββ server.py # FastAPI application + SSE streaming
βββ tests/
β βββ conftest.py # Shared fixtures & mock runner
β βββ test_api_types.py # API type validation
β βββ test_cli_runner.py # CLI runner unit tests
β βββ test_config.py # Configuration tests
β βββ test_e2e.py # End-to-end tests with real CLI
β βββ test_models.py # Model registry tests
β βββ test_real_subprocess.py# Real subprocess tests
β βββ test_server.py # Server endpoint tests
βββ Makefile
βββ pyproject.toml
βββ README.md
Test suite: 74 unit/integration tests + 25 E2E tests β 86% code coverage.
This project wraps the Gemini CLI, which is subject to Google's usage limits, quotas, and Terms of Service. By using this server, you agree to comply with all applicable Google Terms of Service and Gemini API Terms. It is your responsibility to review and adhere to these policies, including any rate limits or usage restrictions that may apply.
{ "model": "gemini-2.5-pro", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "stream": false, // Enable SSE streaming "working_dir": "/path/to/project" // Non-standard extension }