Skip to content

Conversation

@nhorton
Copy link
Contributor

@nhorton nhorton commented Feb 4, 2026

Summary

This PR introduces a major architectural shift from skill-file-based workflow execution to a Model Context Protocol (MCP) server that guides agents through workflows via checkpoint calls with quality gate enforcement.

Key changes:

  • New MCP Server (deepwork serve) with three tools: get_workflows, start_workflow, finished_step
  • Quality Gates that evaluate step outputs against criteria using Claude Code subprocess
  • Nested Workflow Support with stack-based execution and abort_workflow capability
  • Simplified Skill Generation - single /deepwork entry point instead of per-step skills
  • Rules System Removed - entire rules subsystem (parser, queue, pattern matcher, hooks) deleted

Why This Change?

The previous architecture relied heavily on skill files with embedded instructions and rules-based hooks. This had several limitations:

  1. Complex rules evaluation at every agent stop event
  2. Difficult to track workflow state across steps
  3. No structured quality enforcement
  4. Hard to resume or debug workflows

The MCP approach provides:

  1. Centralized state - Session state persisted and visible in .deepwork/tmp/
  2. Quality gates - Automated validation before proceeding to next step
  3. Structured checkpoints - Clear handoff points between steps
  4. Resumability - Sessions can be loaded and resumed
  5. Observability - All state changes logged and inspectable

Changes by Area

New MCP Module (src/deepwork/mcp/)

  • server.py - FastMCP server definition
  • tools.py - MCP tool implementations
  • state.py - Workflow session state management
  • schemas.py - Pydantic models for I/O
  • quality_gate.py - Quality gate with review agent

New CLI Command

  • deepwork serve - Starts MCP server (stdio or SSE transport)

Updated deepwork_jobs Standard Job

  • New steps: iterate, errata, test, fix_jobs, fix_settings
  • Streamlined define, implement, learn steps

Removed Components

  • Entire rules system (rules_parser.py, rules_queue.py, pattern_matcher.py, rules_check.py)
  • Command executor (command_executor.py)
  • deepwork_rules standard job
  • Per-step skill templates
  • Many hook scripts
  • commit and manual_tests jobs

Documentation

  • New doc/mcp_interface.md - MCP tool reference
  • New doc/reference/calling_claude_in_print_mode.md - Claude CLI subprocess guide
  • Updated doc/architecture.md with Part 4: MCP Server Architecture
  • Updated README.md to remove rules references

Test plan

  • Run deepwork install --platform claude in a test project
  • Verify MCP server starts with deepwork serve
  • Test workflow execution via /deepwork skill
  • Verify quality gate evaluation works
  • Run existing test suite: uv run pytest

🤖 Generated with Claude Code

nhorton and others added 25 commits February 3, 2026 12:14
- Add configurable quality_gate settings to config.yml (agent_review_command,
  default_timeout, default_max_attempts)
- Update installer to create quality_gate config section with defaults
- Refactor QualityGate to separate system instructions from user payload
- Use -s flag to pass instructions as system prompt to review agent
- Change file separator format to 20 dashes for clearer delineation
- Remove step_instructions from QualityGate interface (not useful for review)
- Add quality_review_override_reason to finished_step to skip quality gate
- Add JSON schema validation for quality gate responses
- Add comprehensive integration tests with mock review agent subprocess
- Remove block_bash_with_instructions hook (commit skill not available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update e2e tests for Claude Code integration
- Add quality_criteria to fruits job fixture
- Fix test assertions for updated install flow
- Minor sync.py adjustments

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rules system was removed in commit 6b3e1a2. This cleans up
stale documentation references to rules_check in hook-related code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- StateManager now uses a session stack instead of single active session
- Starting a workflow while one is active pushes onto the stack
- Completing a workflow pops from stack and resumes parent
- Added abort_workflow tool with explanation parameter
- All tool responses include stack field [{workflow, step}, ...]
- Added logging to all MCP tool calls with stack info
- Updated server instructions to document nesting and abort

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add `from None` to raise in except clause (B904)
- Remove unused variables in tests (F841)
- Rename unused loop variable to underscore prefix (B007)
- Apply ruff formatting to 14 files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace flake-utils with uv2nix/pyproject-nix for proper Python
dependency management in Nix. This provides hermetic builds directly
from uv.lock and supports editable installs for development.

Key changes:
- Use uv2nix to generate Python package set from uv.lock
- Add pyproject-build-systems for build dependency resolution
- Add editables to build-system requires (needed by hatchling for
  editable wheel builds)
- Remove .venv management from shell hook (Nix handles it now)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix quality_gate.py to handle Claude CLI --output-format json wrapper
  objects by extracting the 'result' field before parsing
- Add tests for wrapper object handling with strong comments explaining
  the mock design
- Remove deprecated 'exposed' field from learn step in deepwork_jobs
- Add 'learn' workflow to make orphaned step accessible via MCP
- Add 'update' workflow to update job for MCP compatibility
- Migrate stop_hooks to quality_criteria in update job
- Clean up settings.json by removing obsolete Skill permissions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the major architectural changes including:
- New MCP server with checkpoint-based workflow execution
- Removal of the rules system
- Simplified skill generation
- New deepwork_jobs steps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@nhorton nhorton changed the title Mcp variant feat: MCP Server Architecture for Checkpoint-Based Workflow Execution Feb 5, 2026
Mark 0.7.0 as alpha prerelease so that `uv add deepwork` continues
to install the stable 0.5.1 by default, requiring explicit version
specification for the new alpha.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
nhorton and others added 9 commits February 6, 2026 11:44
When multiple workflows are active on the stack concurrently, callers can
now pass session_id to finished_step and abort_workflow to target the
correct session instead of always operating on the top-of-stack. This
prevents logical corruption when sub-agents run workflows in parallel.

Changes:
- Add optional session_id field to FinishedStepInput and AbortWorkflowInput
- Add _resolve_session() helper to StateManager for ID-based lookup
- Thread session_id through all StateManager methods and WorkflowTools
- Use filter-based stack removal instead of pop() for mid-stack operations
- Add session_id parameter to MCP server tool registrations
- Add v1.4.0 changelog entry to mcp_interface.md
- Add tests for session_id routing in test_state, test_tools, test_async

Fully backward compatible — omitting session_id preserves top-of-stack behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _load_all_jobs method was catching ParseError and continuing with
no indication of failure, making schema validation errors invisible
to users (e.g. get_workflows returning empty with no explanation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a workflow needs to run a multi-step process on many independent
items, the define step now guides users to split the repeated process
into a separate workflow and fan out via parallel sub-agents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton marked this pull request as ready for review February 9, 2026 19:30
@nhorton nhorton added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
- Update claude-code-test.yml validate-generation job: replace pytest
  test file with inline Python validation of fruits fixture parsing
- Check for deepwork/SKILL.md instead of per-step skills
- Update e2e test to use /deepwork skill instead of /deepwork_jobs and /fruits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
The DeepWork MCP server is registered in .mcp.json with the bare
"deepwork" command. In CI, deepwork is installed via uv sync and
only exists in .venv/bin/, which isn't on PATH. Claude Code fails
to start the MCP server subprocess, causing it to fall back to
ad-hoc file creation instead of using MCP tools.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton enabled auto-merge February 9, 2026 20:45
@nhorton nhorton added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
The previous approach overwrote settings.json with only generic permissions
(Bash, Read, Write, etc.), which removed the MCP tool permissions
(mcp__deepwork__get_workflows, start_workflow, finished_step, abort_workflow)
that `deepwork install` had synced. Without these, Claude silently fails
to call DeepWork MCP tools and returns empty output.

Now merges CI-specific permissions into the existing settings.json, preserving
the MCP tool permissions that the install step wrote.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton enabled auto-merge February 9, 2026 20:53
@nhorton nhorton added this pull request to the merge queue Feb 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026
- Add --debug flag to claude invocation to capture detailed logs
- Add failure step that dumps debug.log, .mcp.json, settings.json, and
  session state when the job creation step fails
- Instruct Claude to stop after define+implement steps instead of running
  the full 4-step workflow (test+iterate are unnecessary for CI)
- Increase timeout from 6 to 10 minutes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nhorton nhorton enabled auto-merge February 9, 2026 21:09
@nhorton nhorton added this pull request to the merge queue Feb 9, 2026
Merged via the queue into main with commit f1cabce Feb 9, 2026
4 checks passed
@nhorton nhorton deleted the mcp-variant branch February 9, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants