feat: MCP Server Architecture for Checkpoint-Based Workflow Execution #200

nhorton · 2026-02-04T21:16:55Z

Summary

This PR introduces a major architectural shift from skill-file-based workflow execution to a Model Context Protocol (MCP) server that guides agents through workflows via checkpoint calls with quality gate enforcement.

Key changes:

New MCP Server (deepwork serve) with three tools: get_workflows, start_workflow, finished_step
Quality Gates that evaluate step outputs against criteria using Claude Code subprocess
Nested Workflow Support with stack-based execution and abort_workflow capability
Simplified Skill Generation - single /deepwork entry point instead of per-step skills
Rules System Removed - entire rules subsystem (parser, queue, pattern matcher, hooks) deleted

Why This Change?

The previous architecture relied heavily on skill files with embedded instructions and rules-based hooks. This had several limitations:

Complex rules evaluation at every agent stop event
Difficult to track workflow state across steps
No structured quality enforcement
Hard to resume or debug workflows

The MCP approach provides:

Centralized state - Session state persisted and visible in .deepwork/tmp/
Quality gates - Automated validation before proceeding to next step
Structured checkpoints - Clear handoff points between steps
Resumability - Sessions can be loaded and resumed
Observability - All state changes logged and inspectable

Changes by Area

New MCP Module (`src/deepwork/mcp/`)

server.py - FastMCP server definition
tools.py - MCP tool implementations
state.py - Workflow session state management
schemas.py - Pydantic models for I/O
quality_gate.py - Quality gate with review agent

New CLI Command

deepwork serve - Starts MCP server (stdio or SSE transport)

Updated `deepwork_jobs` Standard Job

New steps: iterate, errata, test, fix_jobs, fix_settings
Streamlined define, implement, learn steps

Removed Components

Entire rules system (rules_parser.py, rules_queue.py, pattern_matcher.py, rules_check.py)
Command executor (command_executor.py)
deepwork_rules standard job
Per-step skill templates
Many hook scripts
commit and manual_tests jobs

Documentation

New doc/mcp_interface.md - MCP tool reference
New doc/reference/calling_claude_in_print_mode.md - Claude CLI subprocess guide
Updated doc/architecture.md with Part 4: MCP Server Architecture
Updated README.md to remove rules references

Test plan

Run deepwork install --platform claude in a test project
Verify MCP server starts with deepwork serve
Test workflow execution via /deepwork skill
Verify quality gate evaluation works
Run existing test suite: uv run pytest

🤖 Generated with Claude Code

- Add configurable quality_gate settings to config.yml (agent_review_command, default_timeout, default_max_attempts) - Update installer to create quality_gate config section with defaults - Refactor QualityGate to separate system instructions from user payload - Use -s flag to pass instructions as system prompt to review agent - Change file separator format to 20 dashes for clearer delineation - Remove step_instructions from QualityGate interface (not useful for review) - Add quality_review_override_reason to finished_step to skip quality gate - Add JSON schema validation for quality gate responses - Add comprehensive integration tests with mock review agent subprocess - Remove block_bash_with_instructions hook (commit skill not available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Update e2e tests for Claude Code integration - Add quality_criteria to fruits job fixture - Fix test assertions for updated install flow - Minor sync.py adjustments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The rules system was removed in commit 6b3e1a2. This cleans up stale documentation references to rules_check in hook-related code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- StateManager now uses a session stack instead of single active session - Starting a workflow while one is active pushes onto the stack - Completing a workflow pops from stack and resumes parent - Added abort_workflow tool with explanation parameter - All tool responses include stack field [{workflow, step}, ...] - Added logging to all MCP tool calls with stack info - Updated server instructions to document nesting and abort Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add `from None` to raise in except clause (B904) - Remove unused variables in tests (F841) - Rename unused loop variable to underscore prefix (B007) - Apply ruff formatting to 14 files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace flake-utils with uv2nix/pyproject-nix for proper Python dependency management in Nix. This provides hermetic builds directly from uv.lock and supports editable installs for development. Key changes: - Use uv2nix to generate Python package set from uv.lock - Add pyproject-build-systems for build dependency resolution - Add editables to build-system requires (needed by hatchling for editable wheel builds) - Remove .venv management from shell hook (Nix handles it now) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix quality_gate.py to handle Claude CLI --output-format json wrapper objects by extracting the 'result' field before parsing - Add tests for wrapper object handling with strong comments explaining the mock design - Remove deprecated 'exposed' field from learn step in deepwork_jobs - Add 'learn' workflow to make orphaned step accessible via MCP - Add 'update' workflow to update job for MCP compatibility - Migrate stop_hooks to quality_criteria in update job - Clean up settings.json by removing obsolete Skill permissions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Document the major architectural changes including: - New MCP server with checkpoint-based workflow execution - Removal of the rules system - Simplified skill generation - New deepwork_jobs steps Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Mark 0.7.0 as alpha prerelease so that `uv add deepwork` continues to install the stable 0.5.1 by default, requiring explicit version specification for the new alpha. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…work into mcp-variant

When multiple workflows are active on the stack concurrently, callers can now pass session_id to finished_step and abort_workflow to target the correct session instead of always operating on the top-of-stack. This prevents logical corruption when sub-agents run workflows in parallel. Changes: - Add optional session_id field to FinishedStepInput and AbortWorkflowInput - Add _resolve_session() helper to StateManager for ID-based lookup - Thread session_id through all StateManager methods and WorkflowTools - Use filter-based stack removal instead of pop() for mid-stack operations - Add session_id parameter to MCP server tool registrations - Add v1.4.0 changelog entry to mcp_interface.md - Add tests for session_id routing in test_state, test_tools, test_async Fully backward compatible — omitting session_id preserves top-of-stack behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The _load_all_jobs method was catching ParseError and continuing with no indication of failure, making schema validation errors invisible to users (e.g. get_workflows returning empty with no explanation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a workflow needs to run a multi-step process on many independent items, the define step now guides users to split the repeated process into a separate workflow and fan out via parallel sub-agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Update claude-code-test.yml validate-generation job: replace pytest test file with inline Python validation of fruits fixture parsing - Check for deepwork/SKILL.md instead of per-step skills - Update e2e test to use /deepwork skill instead of /deepwork_jobs and /fruits Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The DeepWork MCP server is registered in .mcp.json with the bare "deepwork" command. In CI, deepwork is installed via uv sync and only exists in .venv/bin/, which isn't on PATH. Claude Code fails to start the MCP server subprocess, causing it to fall back to ad-hoc file creation instead of using MCP tools. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous approach overwrote settings.json with only generic permissions (Bash, Read, Write, etc.), which removed the MCP tool permissions (mcp__deepwork__get_workflows, start_workflow, finished_step, abort_workflow) that `deepwork install` had synced. Without these, Claude silently fails to call DeepWork MCP tools and returns empty output. Now merges CI-specific permissions into the existing settings.json, preserving the MCP tool permissions that the install step wrote. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add --debug flag to claude invocation to capture detailed logs - Add failure step that dumps debug.log, .mcp.json, settings.json, and session state when the job creation step fails - Instruct Claude to stop after define+implement steps instead of running the full 4-step workflow (test+iterate are unnecessary for CI) - Increase timeout from 6 to 10 minutes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nhorton and others added 25 commits February 3, 2026 12:14

Removed rules

6b3e1a2

Port theoretically done

26a9911

mcp loads now

9b633b0

chore: Update tests and sync for MCP variant

cd2ae63

- Update e2e tests for Claude Code integration - Add quality_criteria to fruits job fixture - Fix test assertions for updated install flow - Minor sync.py adjustments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cleaned up MCP rules

a3fae18

Remove old jobs

c3754f6

chore: Remove dead rules_check references from docstrings

fd0d348

The rules system was removed in commit 6b3e1a2. This cleans up stale documentation references to rules_check in hook-related code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

async

443e13e

repair added but not run

0b3d666

cleaned up

f3af9d6

Version bump

c5c9f97

Merge branch 'main' into mcp-variant

9e63119

Fix ruff lint errors and apply formatting

88477a4

- Add `from None` to raise in except clause (B904) - Remove unused variables in tests (F841) - Rename unused loop variable to underscore prefix (B007) - Apply ruff formatting to 14 files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cleanups

897535c

MCP command updated

18043b3

add_job improved

fa40407

tighter instructions

3e21805

stop backing up rules

e122265

make_new_job.sh preserved, parallel execution, no dupe quality criteria

0000c17

formatting

d570baf

nhorton changed the title ~~Mcp variant~~ feat: MCP Server Architecture for Checkpoint-Based Workflow Execution Feb 5, 2026

remove update job

b561e2a

nhorton temporarily deployed to pypi February 5, 2026 18:40 — with GitHub Actions Inactive

Fix release version to prerelease (0.7.0a1)

48e23fe

Mark 0.7.0 as alpha prerelease so that `uv add deepwork` continues to install the stable 0.5.1 by default, requiring explicit version specification for the new alpha. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

nhorton temporarily deployed to pypi February 5, 2026 20:05 — with GitHub Actions Inactive

nhorton and others added 9 commits February 6, 2026 11:44

Merge branch 'mcp-variant' of https://github.com/Unsupervisedcom/deep…

cff723f

…work into mcp-variant

ready to test

b96d22a

Manual test added

8b8b6ed

Bump version to 0.7.0

4c8b60b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply ruff formatting fixes across MCP source and tests

06fcf19

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into mcp-variant

740a962

nhorton marked this pull request as ready for review February 9, 2026 19:30