Buildable, machine-checkable scientific knowledge.
Quick start · Documentation · Repository status · Contributing
Scientific Memory turns mathematically structured science from prose into machine-checkable, executable, composable artifacts with full provenance. It is not a paper summarizer or a theorem leaderboard: it is a knowledge-upgrading pipeline built for durable scientific inheritance.
| You get | How |
|---|---|
| Traceable claims | Every claim anchored with source_span and schema-valid JSON |
| Formal layer | Lean 4 + mathlib, linked from the corpus via mapping and theorem cards |
| Executable witnesses | Kernels with explicit verification boundaries |
| Inspectable output | Portal rendered only from canonical manifests and exports |
| Reproducible gates | Unified validation, CI, benchmarks, and signed releases |
The project optimizes for:
| Pillar | Meaning |
|---|---|
| Explicit claims & assumptions | No silent hand-waving between text and formal code |
| Formal declarations | Machine-checked where the project commits to it |
| Executable kernels | Where numerical or computational alignment matters |
| Versioned provenance | Artifacts you can audit and rebuild |
| Reproducible builds | Lean, Python, and portal all part of one bar |
| Area | Role |
|---|---|
corpus/ |
Schema-first papers: metadata, claims, assumptions, symbols, manifests |
formal/ |
Lean 4 library (ScientificMemory) linked to the corpus |
schemas/ |
Canonical JSON Schema for all public artifacts |
pipeline/ |
sm_pipeline: ingest, extract, validate (gate engine), publish, portal export |
kernels/ |
Executable kernels + shared kernels/conformance/ test helpers |
portal/ |
Next.js UI from corpus-export.json and corpus data |
benchmarks/ |
Regression tasks, gold labels, thresholds, proof-success trends |
git clone https://github.com/fraware/scientific-memory.git
cd scientific-memory
just bootstrap # toolchains and dependencies
just build # Lean + portal + Python tests
just validate # full corpus / schema / graph gates
just portal # local dev server (see terminal for URL)| Situation | Command |
|---|---|
| Something failed early | just doctor (uv, pnpm, Lean, Lake) |
| Lean only | just lake-build or just lake-build-verbose LOG=lake-build.log |
| Full pre-PR sweep | just check |
Without just (or Git Bash missing) |
Contributor playbook – Local CI |
Use this as the single local path before opening a PR:
just bootstrapjust checkjust benchmark
If you cannot use just, run the equivalent uv/lake/pnpm commands from Contributor playbook – Local CI. On Windows, just uses Git for Windows Bash (see that section). That playbook section is also the canonical non-just path.
Current tree (corpus, pipeline, CI, metrics) — click to expand
- Corpus: Eight indexed papers in
corpus/index.json: six formalized core slices plus two hard-dimension stress scaffolds (stress_units_dimensional_2024,stress_approx_asymptotic_2024). Scaffoldtest_new_paperwas retired from the index. Per-paper machine-checked counts and manifestbuild_hash_version/ dependency-graph edge counts are generated in docs/status/repo-snapshot.md (just repo-snapshot); do not treat README prose as the live source of those numbers. - Pipeline: Ingest through publish and portal export; unified validation in
gate_engine. Trust boundary (canonical JSON vs LLM/suggestion sidecars, publish integrity, build hash v2): docs/reference/trust-boundary-and-extraction.md. Tests: runjust testoruv run pytest --collect-only -qfor current counts (pipeline + kernel packages). - CI: All seven gates in place (Lean build, schema + graph + migration checks, provenance, coverage, portal build + smoke test, benchmark regression with proof-success snapshot, per-paper slices, trend history, runtime budgets, minimum thresholds and
tasks_ceilingupper bounds inbenchmarks/baseline_thresholds.json(e.g. source-span alignment error rate ontasks.gold), release integrity). Gate 7: checksums plus Sigstore (cosign) keyless signing; tagged releases publish a GitHub Release with changelog, checksums, signatures, andrelease-bundle.zip. Verify script:scripts/verify_release_checksums.sh. Quality: Ruff on pipeline; tests as above. - Infra: Policy docs in
docs/infra/(README, cache-policy, release-policy); CI and release under.github/workflows/and repo root. - Contributor tooling:
just doctorfor environment diagnostics; stage banners injust check;just lake-build/just lake-build-verbose LOG=...for Lean build logs. SPEC and playbook use real paper IDlangmuir_1918_adsorptionin examples. - Metrics:
just metrics(median intake, dependency, symbol conflict, proof completion, axiom count, research-value including literature_errors, claims_with_clarified_assumptions, kernels_with_formally_linked_invariants, source-span alignment, normalization visibility, assumption-suggestions, dimension-visibility, dimension-suggestions).just benchmarkwritesbenchmarks/reports/latest.jsonwithproof_success_snapshot,proof_success_summary.md, and task outputs includingtasks.gold(precision/recall/F1,papers_with_gold, and source-span alignment fields),tasks.llm_suggestions/tasks.llm_lean_suggestions(optional LLM sidecar footprint metrics),tasks.llm_eval(reviewed reference bundles underbenchmarks/llm_eval/), andllm_prompt_templates(declared prompt SHA-256 map). Gate 6 compares againstbenchmarks/baseline_thresholds.json(tasksminima andtasks_ceiling). Usejust scaffold-gold <paper_id>when admitting a paper (all indexed papers currently have gold). - LLM integration: Optional Prime Intellect inference for claims, mapping, and Lean proposals (suggest-only, human-gated apply). Full end-to-end pipeline run validated on
math_sum_evenswithallenai/olmo-3.1-32b-instruct; evaluation infrastructure includes prompt versioning, reference fixtures, benchmark taskllm_eval, and human review rubric. See docs/tooling/prime-intellect-llm.md and docs/testing/llm-lean-live-test-matrix.md. - Optional: Blueprint check (
just check-paper-blueprint), check-tooling (pandoc), extract-from-source, build-verso, mcp-server. Blueprints underdocs/blueprints/cover Langmuir, Freundlich,temkin_1941_adsorption, andphysics_kinematics_uniform(mapping mirror where present). Role playbooks:docs/playbooks/(formalizer, reviewer, domain-expander, release-manager). Portal dependencies pinned (Next.js ^14.2, React ^18.3) for reproducible builds. Also: Hypothesis-based property tests for adsorption kernels; shared kernel test helpers (kernels/conformance/workspace package); theorem-card reviewer lifecycle (contributor-playbook.md);batch-admit --dry-run; snapshot baseline quality validation;validate-all --report-jsonfor gate reports;just repo-snapshotfor docs/status/repo-snapshot.md.
flowchart TD
subgraph intake [Intake]
Add[Add paper]
Extract[Extract claims and context]
end
subgraph optional [Optional LLM assistance]
LLM[LLM proposals]
Review[Human review]
Apply[Apply after review]
end
subgraph canonical [Canonical work]
Norm[Normalize and link]
Map[Map to Lean]
Formal[Formalize in Lean]
end
subgraph validation [Validation publish]
Gates[Gate engine validate all]
Publish[Publish manifests and theorem cards]
Export[Portal export]
end
subgraph outputs [Outputs]
Portal[Portal pages]
Bench[Benchmarks and regression]
Manifests[Published artifacts]
end
Add --> Extract
Extract --> Norm
Norm --> Map
Map --> Formal
Norm --> Gates
Formal --> Gates
Gates --> Publish
Publish --> Manifests
Publish --> Export
Export --> Portal
Publish --> Bench
Formal --> Bench
Extract -.-> LLM
LLM --> Review
Review --> Apply
Apply --> Norm
Apply --> Map
Apply --> Formal
Apply --> Gates
| Topic | Link |
|---|---|
| Index | docs/README.md |
| Contributor playbook (setup, paper workflow, local CI, reuse, review, verification, Verso, schema migrations, Gate 7) | docs/contributor-playbook.md |
| Architecture | docs/architecture.md |
| Roadmap | ROADMAP.md |
| Paper intake (SPEC 8.1) | docs/paper-intake.md |
| Metrics (SPEC 12) | docs/metrics.md |
| ADRs | docs/adr/README.md |
| Infra / CI policy | docs/infra/README.md |
| Repo snapshot | docs/status/repo-snapshot.md (just repo-snapshot) |
| Maintainers (public push, CI, triage, launch) | docs/maintainers.md |
| MCP tooling (optional) | docs/tooling/mcp-lean-tooling.md |
| Prime Intellect LLM (optional, suggest-only) | docs/tooling/prime-intellect-llm.md |
| Trust boundary and manual E2E scenarios | docs/reference/trust-boundary-and-extraction.md · docs/testing/trust-hardening-e2e-scenarios.md · LLM Lean live test matrix |
| Pandoc / LaTeX (optional) | docs/tooling/pandoc-latex-integration.md |
| Resource | Link |
|---|---|
| How to contribute | CONTRIBUTING.md |
| Step-by-step playbook | docs/contributor-playbook.md |
| Pipeline extension points | docs/pipeline-extension-points.md |
- Artifact-first, model-second — durable JSON and Lean, not one-off prose.
- Provenance is mandatory — claims and cards stay tied to sources.
- Verification boundaries are explicit — proof vs witness vs heuristic is visible.
- Claim bundles are the core unit — not isolated theorems in a void.
- Full buildability is the minimum bar — no merge without the agreed gates.
Licensed under Apache-2.0 — see LICENSE.