Document-powered learning platform — Extract, index, and learn from your documents. Upload PDFs, slides, or textbooks; get vision-backed extraction, config-driven RAG, and a full study workspace: chat, AI notes, flashcards, and AI-generated assessments — all grounded in your corpus.
I learn by asking questions. Not surface-level ones. The deep, obsessive "why"s that most materials never bother to answer. When my peers were studying from slides and PDFs, I sat there stuck. I couldn't absorb content I wasn't allowed to interrogate. Documents don't talk back. They don't explain the intuition, the connections, the why. Tools like NotebookLM couldn't help either: they don't understand images inside the data source, so those parts show up blank. Most of my slides were visual or text as screenshots. I was left with nothing.
So I built something for myself. A platform that extracts content from any document — slides, papers, textbooks — and lets me use AI to actually ask. Why does this work? What's the reasoning here? How does this connect to that thing from last week? It grew from "extract and query" into a full study environment: converse over the corpus, generate notes and flashcards, and create or take AI-generated assessments with automatic grading. For the first time, static documents became something I could learn from. Not by re-reading. By conversing, noting, and testing.
I'm open-sourcing it because I'm probably not the only one who learns this way.
Document processing & RAG
- Full content extraction — Native PDF/DOCX/PPTX/XLSX text plus vision for every embedded image (equations, diagrams, labels).
- Azure AI Vision — Computer Vision Describe + Read (OCR) for images when Azure OpenAI vision isn’t available.
- LLM refinement — Optional pass to clean extracted text: markdown, LaTeX math, boilerplate removed, before indexing.
- Config-driven — Single
docproc.yaml: one vector store, multiple AI providers. - Stores — PgVector, Qdrant, Chroma, FAISS, or in-memory.
- Providers — OpenAI, Azure, Anthropic, Ollama, LiteLLM.
- RAG — Embedding-based or Apple CLaRa.
- Async upload — Background processing with per-file progress bar; parallel image extraction.
Study workspace (API + React UI)
- Projects — Organize documents per project; all study features are scoped to the current project's corpus.
- Converse — RAG chat over uploaded documents with source citations.
- Notes — AI-generated study notes plus your own; export to PDF.
- Flashcards — Decks generated from documents or text; flip-card review.
- Assessments — Create AI-generated quizzes from a document; take them and get auto-graded results (conceptual, derivation, formula, multi-select).
- Open WebUI — OpenAI-compatible chat routes so Open WebUI can use your indexed documents.
Upload (PDF/DOCX/PPTX/XLSX)
→ Extract (native text + vision for images)
→ Refine (LLM: markdown, LaTeX, no boilerplate) [optional]
→ Sanitize & dedupe
→ Index into vector store
→ Query via RAG + study features (chat, notes, flashcards, assessments)
- Config:
docproc.yamlselects one database and one primary AI provider. - Vision: PDFs use native text layer; embedded images go to Azure Vision (Describe + Read) or a vision LLM.
- Refinement: With
ingest.use_llm_refine: true, extracted text is cleaned and formatted before storage.
See docs/CONFIGURATION.md for the full schema.
# 1. Clone and install
git clone https://github.com/rithulkamesh/docproc.git && cd docproc
uv sync --python 3.12
# 2. Config and env
cp docproc.example.yaml docproc.yaml
cp .env.example .env
# Edit docproc.yaml (database + primary_ai) and .env (API keys, DATABASE_URL)
# 3. Start vector DB (e.g. Qdrant)
docker run -d -p 6333:6333 qdrant/qdrant
# 4. Run API
docproc-serve
# 5. Run React frontend (another terminal)
cd web
npm install
VITE_DOCPROC_API_URL=http://localhost:8000 npm run devOpen http://localhost:3000 — upload a PDF, watch per-file progress, then chat, take notes, review flashcards, or create and take AI-generated assessments in the document-powered study workspace.
Create docproc.yaml in the project root (see docs/CONFIGURATION.md):
database:
provider: pgvector # pgvector | qdrant | chroma | faiss | memory
# connection_string from DATABASE_URL or set here
ai_providers:
- provider: azure # or openai, anthropic, ollama, litellm
primary_ai: azure
rag:
backend: embedding
top_k: 5
chunk_size: 512
ingest:
use_vision: true # PDF: extract text + vision for images
use_llm_refine: true # Clean markdown, LaTeX, remove boilerplateSecrets (API keys, endpoints) come from environment variables or .env. See .env.example.
# With uv (recommended — isolated install, adds docproc to PATH)
uv tool install git+https://github.com/rithulkamesh/docproc.git
# Or with pip
pip install git+https://github.com/rithulkamesh/docproc.gitThen docproc --file input.pdf -o output.md. CLI uses the same config and providers as the server (OpenAI, Azure, Anthropic, Ollama, LiteLLM). For Ollama: ollama pull llava && ollama serve and use docproc.cli.yaml or primary_ai: ollama.
uv tool install 'docproc[server] @ git+https://github.com/rithulkamesh/docproc.git'
# or pip install docproc[server]git clone https://github.com/rithulkamesh/docproc.git && cd docproc
uv sync --python 3.12
# Run: uv run docproc --file input.pdf -o output.md
# Or install: uv pip install -e .DOCPROC_CONFIG=docproc.yaml docproc-serve
# API at http://localhost:8000Endpoints: POST /documents/upload, GET /documents/, GET /documents/{id}, POST /query, GET /models. Upload returns immediately with a document ID; processing runs in the background. Poll GET /documents/{id} for status and progress (page/total/message).
The React frontend lives in web/ and talks to the FastAPI backend via VITE_DOCPROC_API_URL:
cd web
npm install
VITE_DOCPROC_API_URL=http://localhost:8000 npm run dev- Notebook Guide — Corpus-level summary, suggested questions, and one-click report templates.
- Converse — RAG chat over all uploaded documents with source snippets.
- Sources — Per-project document library; upload, track processing, select docs for notes/flashcards/assessments.
- Notes — AI-generated study notes + your own notes; export to PDF.
- Flashcards — Decks generated from documents or text, with a flip-card review UI.
- Assessments — Create AI-generated quizzes from a document; take them and view auto-graded results (conceptual, derivation, formula, multi-select).
From GHCR (recommended):
docker run -p 8000:8000 -e OPENAI_API_KEY=sk-xxx ghcr.io/rithulkamesh/docproc:latestBuild locally (standalone, in-memory DB):
docker build -t docproc:2.0 .
docker run -p 8000:8000 -e OPENAI_API_KEY=sk-xxx docproc:2.0Full stack (API + React UI + Postgres + Qdrant):
cp docproc.example.yaml docproc.yaml
# In docproc.yaml set database.provider: pgvector (compose sets DATABASE_URL to Postgres)
# Optionally copy .env.example to .env and set OPENAI_API_KEY
docker compose up --build api web postgres qdrant
# UI: http://localhost:3000 API: http://localhost:8000See docs/DOCKER.md for detailed Docker run instructions.
Point Open WebUI to http://localhost:8000/api for OpenAI-compatible chat backed by your documents.
# Requires Ollama + vision model (ollama pull llava)
cp docproc.cli.yaml docproc.yaml
docproc --file input.pdf -o output.md| Doc | Description |
|---|---|
| docs/README.md | Documentation index |
| docs/CONFIGURATION.md | Config schema, database options, AI providers, ingest, RAG |
| docs/ARCHITECTURE.md | Pipeline flow, modules, CLI vs API |
| docs/AZURE_SETUP.md | Azure OpenAI + Azure AI Vision (Describe + Read), credentials |
| docs/ASSESSMENTS_AI.md | AI-generated assessments, grading pipeline, question types |
DOCPROC_CONFIG— Path to config file (default:docproc.yaml).VITE_DOCPROC_API_URL— API base URL for the React frontend (default in dev:http://localhost:8000).DATABASE_URL— Overridesdatabase.connection_string(e.g. Postgres).- Provider-specific:
OPENAI_API_KEY,AZURE_OPENAI_API_KEY,AZURE_OPENAI_ENDPOINT,AZURE_OPENAI_DEPLOYMENT,AZURE_VISION_ENDPOINT, etc. See .env.example and docs/CONFIGURATION.md.
Pull requests welcome. Ensure tests pass.
MIT. See LICENSE.md.