Skip to content

Added single PDF end-to-end: ingest, query, streamed answer#24

Open
markgewhite wants to merge 6 commits intomainfrom
feature/4-single-pdf-end-to-end
Open

Added single PDF end-to-end: ingest, query, streamed answer#24
markgewhite wants to merge 6 commits intomainfrom
feature/4-single-pdf-end-to-end

Conversation

@markgewhite
Copy link
Copy Markdown
Owner

Summary

  • Added DocumentLoader, Chunker, VectorStore, and Answerer modules implementing the first end-to-end RAG slice
  • Wired Chainlit on_message handler: query → semantic search → streamed LLM answer
  • LangChain used only for PDF loading (PyPDFLoader) and text splitting (RecursiveCharacterTextSplitter), with boundary comments throughout
  • All other components (embeddings, vector store, prompt construction, LLM calls) use direct library calls (ollama, chromadb)

Closes #4

Test plan

  • Tracer bullet test: Document → chunk → store → search → prompt (end-to-end path)
  • Chunker tests: short/long docs, size limits, metadata preservation, chunk_index, doc_hash
  • DocumentLoader tests: PDF loading, metadata, empty folder, recursive/non-recursive
  • VectorStore tests: add, search, get_all_texts, has_document (in-memory ChromaDB)
  • Answerer tests: prompt construction with source citations
  • Manual: uv run chainlit run app.py with a configured document folder

🤖 Generated with Claude Code

markgewhite and others added 6 commits April 1, 2026 14:44
…-to-end test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hunk_index, doc_hash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…th test isolation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single PDF end-to-end: ingest → query → streamed answer

1 participant