Skip to content

Document Intelligence Platform — Extract, refine, and query documents with vision LLMs and config-driven RAG.

License

Notifications You must be signed in to change notification settings

rithulkamesh/docproc

Repository files navigation

docproc

Document-powered learning platform — Extract, index, and learn from your documents. Upload PDFs, slides, or textbooks; get vision-backed extraction, config-driven RAG, and a full study workspace: chat, AI notes, flashcards, and AI-generated assessments — all grounded in your corpus.

Motivation

I learn by asking questions. Not surface-level ones. The deep, obsessive "why"s that most materials never bother to answer. When my peers were studying from slides and PDFs, I sat there stuck. I couldn't absorb content I wasn't allowed to interrogate. Documents don't talk back. They don't explain the intuition, the connections, the why. Tools like NotebookLM couldn't help either: they don't understand images inside the data source, so those parts show up blank. Most of my slides were visual or text as screenshots. I was left with nothing.

So I built something for myself. A platform that extracts content from any document — slides, papers, textbooks — and lets me use AI to actually ask. Why does this work? What's the reasoning here? How does this connect to that thing from last week? It grew from "extract and query" into a full study environment: converse over the corpus, generate notes and flashcards, and create or take AI-generated assessments with automatic grading. For the first time, static documents became something I could learn from. Not by re-reading. By conversing, noting, and testing.

I'm open-sourcing it because I'm probably not the only one who learns this way.


Features

Document processing & RAG

  • Full content extraction — Native PDF/DOCX/PPTX/XLSX text plus vision for every embedded image (equations, diagrams, labels).
  • Azure AI Vision — Computer Vision Describe + Read (OCR) for images when Azure OpenAI vision isn’t available.
  • LLM refinement — Optional pass to clean extracted text: markdown, LaTeX math, boilerplate removed, before indexing.
  • Config-driven — Single docproc.yaml: one vector store, multiple AI providers.
  • Stores — PgVector, Qdrant, Chroma, FAISS, or in-memory.
  • Providers — OpenAI, Azure, Anthropic, Ollama, LiteLLM.
  • RAG — Embedding-based or Apple CLaRa.
  • Async upload — Background processing with per-file progress bar; parallel image extraction.

Study workspace (API + React UI)

  • Projects — Organize documents per project; all study features are scoped to the current project's corpus.
  • Converse — RAG chat over uploaded documents with source citations.
  • Notes — AI-generated study notes plus your own; export to PDF.
  • Flashcards — Decks generated from documents or text; flip-card review.
  • Assessments — Create AI-generated quizzes from a document; take them and get auto-graded results (conceptual, derivation, formula, multi-select).
  • Open WebUI — OpenAI-compatible chat routes so Open WebUI can use your indexed documents.

Architecture

Upload (PDF/DOCX/PPTX/XLSX)
    → Extract (native text + vision for images)
    → Refine (LLM: markdown, LaTeX, no boilerplate) [optional]
    → Sanitize & dedupe
    → Index into vector store
    → Query via RAG + study features (chat, notes, flashcards, assessments)
  • Config: docproc.yaml selects one database and one primary AI provider.
  • Vision: PDFs use native text layer; embedded images go to Azure Vision (Describe + Read) or a vision LLM.
  • Refinement: With ingest.use_llm_refine: true, extracted text is cleaned and formatted before storage.

See docs/CONFIGURATION.md for the full schema.

Quick Start

# 1. Clone and install
git clone https://github.com/rithulkamesh/docproc.git && cd docproc
uv sync --python 3.12

# 2. Config and env
cp docproc.example.yaml docproc.yaml
cp .env.example .env
# Edit docproc.yaml (database + primary_ai) and .env (API keys, DATABASE_URL)

# 3. Start vector DB (e.g. Qdrant)
docker run -d -p 6333:6333 qdrant/qdrant

# 4. Run API
docproc-serve

# 5. Run React frontend (another terminal)
cd web
npm install
VITE_DOCPROC_API_URL=http://localhost:8000 npm run dev

Open http://localhost:3000 — upload a PDF, watch per-file progress, then chat, take notes, review flashcards, or create and take AI-generated assessments in the document-powered study workspace.

Configuration

Create docproc.yaml in the project root (see docs/CONFIGURATION.md):

database:
  provider: pgvector   # pgvector | qdrant | chroma | faiss | memory
  # connection_string from DATABASE_URL or set here

ai_providers:
  - provider: azure    # or openai, anthropic, ollama, litellm
primary_ai: azure

rag:
  backend: embedding
  top_k: 5
  chunk_size: 512

ingest:
  use_vision: true      # PDF: extract text + vision for images
  use_llm_refine: true   # Clean markdown, LaTeX, remove boilerplate

Secrets (API keys, endpoints) come from environment variables or .env. See .env.example.

Installation

CLI

# With uv (recommended — isolated install, adds docproc to PATH)
uv tool install git+https://github.com/rithulkamesh/docproc.git

# Or with pip
pip install git+https://github.com/rithulkamesh/docproc.git

Then docproc --file input.pdf -o output.md. CLI uses the same config and providers as the server (OpenAI, Azure, Anthropic, Ollama, LiteLLM). For Ollama: ollama pull llava && ollama serve and use docproc.cli.yaml or primary_ai: ollama.

Server (API + RAG + frontend)

uv tool install 'docproc[server] @ git+https://github.com/rithulkamesh/docproc.git'
# or pip install docproc[server]

From source (dev)

git clone https://github.com/rithulkamesh/docproc.git && cd docproc
uv sync --python 3.12
# Run: uv run docproc --file input.pdf -o output.md
# Or install: uv pip install -e .

Usage

API

DOCPROC_CONFIG=docproc.yaml docproc-serve
# API at http://localhost:8000

Endpoints: POST /documents/upload, GET /documents/, GET /documents/{id}, POST /query, GET /models. Upload returns immediately with a document ID; processing runs in the background. Poll GET /documents/{id} for status and progress (page/total/message).

Frontend

The React frontend lives in web/ and talks to the FastAPI backend via VITE_DOCPROC_API_URL:

cd web
npm install
VITE_DOCPROC_API_URL=http://localhost:8000 npm run dev
  • Notebook Guide — Corpus-level summary, suggested questions, and one-click report templates.
  • Converse — RAG chat over all uploaded documents with source snippets.
  • Sources — Per-project document library; upload, track processing, select docs for notes/flashcards/assessments.
  • Notes — AI-generated study notes + your own notes; export to PDF.
  • Flashcards — Decks generated from documents or text, with a flip-card review UI.
  • Assessments — Create AI-generated quizzes from a document; take them and view auto-graded results (conceptual, derivation, formula, multi-select).

Docker

From GHCR (recommended):

docker run -p 8000:8000 -e OPENAI_API_KEY=sk-xxx ghcr.io/rithulkamesh/docproc:latest

Build locally (standalone, in-memory DB):

docker build -t docproc:2.0 .
docker run -p 8000:8000 -e OPENAI_API_KEY=sk-xxx docproc:2.0

Full stack (API + React UI + Postgres + Qdrant):

cp docproc.example.yaml docproc.yaml
# In docproc.yaml set database.provider: pgvector (compose sets DATABASE_URL to Postgres)
# Optionally copy .env.example to .env and set OPENAI_API_KEY
docker compose up --build api web postgres qdrant
# UI: http://localhost:3000  API: http://localhost:8000

See docs/DOCKER.md for detailed Docker run instructions.

Open WebUI

Point Open WebUI to http://localhost:8000/api for OpenAI-compatible chat backed by your documents.

CLI

# Requires Ollama + vision model (ollama pull llava)
cp docproc.cli.yaml docproc.yaml
docproc --file input.pdf -o output.md

Documentation

Doc Description
docs/README.md Documentation index
docs/CONFIGURATION.md Config schema, database options, AI providers, ingest, RAG
docs/ARCHITECTURE.md Pipeline flow, modules, CLI vs API
docs/AZURE_SETUP.md Azure OpenAI + Azure AI Vision (Describe + Read), credentials
docs/ASSESSMENTS_AI.md AI-generated assessments, grading pipeline, question types

Environment

  • DOCPROC_CONFIG — Path to config file (default: docproc.yaml).
  • VITE_DOCPROC_API_URL — API base URL for the React frontend (default in dev: http://localhost:8000).
  • DATABASE_URL — Overrides database.connection_string (e.g. Postgres).
  • Provider-specific: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT, AZURE_VISION_ENDPOINT, etc. See .env.example and docs/CONFIGURATION.md.

Contributing

Pull requests welcome. Ensure tests pass.

License

MIT. See LICENSE.md.

Contact

hi@rithul.dev

Sponsor this project

 

Packages

 
 
 

Contributors