Skip to content

ldcx1/gemini-cli-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Gemini CLI Server

OpenAI-compatible API server wrapping the Gemini CLI

Python 3.10+ FastAPI MIT License Tests Coverage


✨ Features

Feature Description
OpenAI-compatible API Works with any client that speaks the OpenAI chat completions format
Drop-in Ollama replacement Change your base URL and you're done
True streaming Real-time SSE streaming β€” tokens are emitted as the CLI generates them
Multiple models Supports all Gemini models available through the CLI
Working directory control Point the CLI at any directory for context-aware code exploration
Large prompt handling Prompts over 100 KB are automatically piped via stdin
Multimodal support Base64-encoded images via OpenAI's vision format
Privacy-first logging User prompts are never logged
Zero configuration Sensible defaults, everything configurable via env vars

Tip

For image analysis tasks, use gemini-2.5-pro or higher. gemini-2.5-flash may hallucinate that it cannot read image files through the CLI.


πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Gemini CLI installed and authenticated

Installation

# Install
pip install -e .

# Or with dev dependencies
pip install -e ".[dev]"

Start the Server

# Default: http://0.0.0.0:11435
python -m gemini_cli_server

# Or use the CLI command
gemini-cli-server

Make a Request

# List models
curl http://localhost:11435/v1/models

# Chat completion
curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# With a working directory (for code exploration)
curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Explain the architecture of this project"}],
    "working_dir": "/path/to/your/project"
  }'

# Streaming
curl -N -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Write a poem"}],
    "stream": true
  }'

πŸ”„ Drop-in Replacement

OpenAI Python SDK

# Before (Ollama)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")

# After (Gemini CLI Server) β€” just change the port
client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

LiteLLM

import litellm

response = litellm.completion(
    model="openai/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}],
    api_base="http://localhost:11435/v1",
    api_key="unused",
)

πŸ“‘ API Reference

Endpoints

Method Path Description
GET / Server info
GET /health Health check (verifies CLI is responsive)
GET /v1/models List available models
GET /v1/models/{id} Get model details
POST /v1/chat/completions Create chat completion

Chat Completion Request

{
  "model": "gemini-2.5-pro",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,            // Enable SSE streaming
  "working_dir": "/path/to/project"  // Non-standard extension
}

The working_dir field sets the working directory for the CLI subprocess β€” enabling context-aware responses for code exploration. Also accepted via the X-Working-Dir HTTP header.

Available Models

Model ID Description
gemini-3-pro-preview Gemini 3 Pro Preview
gemini-3-flash-preview Gemini 3 Flash Preview
gemini-2.5-pro Gemini 2.5 Pro (default)
gemini-2.5-flash Gemini 2.5 Flash
gemini-2.5-flash-lite Gemini 2.5 Flash Lite

Any model identifier accepted by the Gemini CLI can be used β€” the server passes it through directly.


βš™οΈ Configuration

All settings are configurable via environment variables:

Variable Default Description
GEMINI_CLI_HOST 0.0.0.0 Server bind address
GEMINI_CLI_PORT 11435 Server port
GEMINI_CLI_DEFAULT_MODEL gemini-2.5-pro Default model
GEMINI_CLI_COMMAND gemini Path to the Gemini CLI binary
GEMINI_CLI_WORKING_DIR (cwd) Default working directory
GEMINI_CLI_TIMEOUT 300 CLI timeout in seconds
GEMINI_CLI_MAX_RETRIES 2 Max retries on transient errors
GEMINI_CLI_LOG_LEVEL info Log level

πŸ›‘οΈ Security Considerations

Working directory access

The working_dir parameter is passed directly to the CLI subprocess β€” there is no path validation or sandboxing. Any accessible path on the server can be specified.

This is by design for local and trusted-network use. If exposing to untrusted clients:

  • Run the server in a container or restricted environment
  • Add a reverse proxy with path validation
  • Implement an allowlist of permitted directories
Prompt size limits
  • Prompts under 100 KB β†’ passed as CLI arguments (fast, simple)
  • Prompts over 100 KB β†’ automatically piped via stdin (avoids ARG_MAX limits)
Privacy

User prompts are never logged. Only metadata is recorded (model name, working directory, prompt size, timing).


πŸ§ͺ Development

make install-dev     # Install dev dependencies
make test            # Run unit tests
make test-cov        # Run with coverage
make test-e2e        # Run E2E tests (requires gemini CLI)
make lint            # Lint
make format          # Format

Project Structure

gemini-cli-server/
β”œβ”€β”€ gemini_cli_server/
β”‚   β”œβ”€β”€ __init__.py            # Package version
β”‚   β”œβ”€β”€ __main__.py            # CLI entry point
β”‚   β”œβ”€β”€ api_types.py           # OpenAI-compatible Pydantic models
β”‚   β”œβ”€β”€ cli_runner.py          # Subprocess executor + streaming
β”‚   β”œβ”€β”€ config.py              # Configuration with env overrides
β”‚   β”œβ”€β”€ models.py              # Model registry
β”‚   └── server.py              # FastAPI application + SSE streaming
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ conftest.py            # Shared fixtures & mock runner
β”‚   β”œβ”€β”€ test_api_types.py      # API type validation
β”‚   β”œβ”€β”€ test_cli_runner.py     # CLI runner unit tests
β”‚   β”œβ”€β”€ test_config.py         # Configuration tests
β”‚   β”œβ”€β”€ test_e2e.py            # End-to-end tests with real CLI
β”‚   β”œβ”€β”€ test_models.py         # Model registry tests
β”‚   β”œβ”€β”€ test_real_subprocess.py# Real subprocess tests
β”‚   └── test_server.py         # Server endpoint tests
β”œβ”€β”€ Makefile
β”œβ”€β”€ pyproject.toml
└── README.md

Test suite: 74 unit/integration tests + 25 E2E tests β€” 86% code coverage.


⚠️ Disclaimer

This project wraps the Gemini CLI, which is subject to Google's usage limits, quotas, and Terms of Service. By using this server, you agree to comply with all applicable Google Terms of Service and Gemini API Terms. It is your responsibility to review and adhere to these policies, including any rate limits or usage restrictions that may apply.


πŸ“„ License

MIT

About

Minimal OpenAI API wrapper over gemini-cli

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors