-
Notifications
You must be signed in to change notification settings - Fork 1
feat: MCP Server Architecture for Checkpoint-Based Workflow Execution #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add configurable quality_gate settings to config.yml (agent_review_command, default_timeout, default_max_attempts) - Update installer to create quality_gate config section with defaults - Refactor QualityGate to separate system instructions from user payload - Use -s flag to pass instructions as system prompt to review agent - Change file separator format to 20 dashes for clearer delineation - Remove step_instructions from QualityGate interface (not useful for review) - Add quality_review_override_reason to finished_step to skip quality gate - Add JSON schema validation for quality gate responses - Add comprehensive integration tests with mock review agent subprocess - Remove block_bash_with_instructions hook (commit skill not available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update e2e tests for Claude Code integration - Add quality_criteria to fruits job fixture - Fix test assertions for updated install flow - Minor sync.py adjustments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rules system was removed in commit 6b3e1a2. This cleans up stale documentation references to rules_check in hook-related code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- StateManager now uses a session stack instead of single active session
- Starting a workflow while one is active pushes onto the stack
- Completing a workflow pops from stack and resumes parent
- Added abort_workflow tool with explanation parameter
- All tool responses include stack field [{workflow, step}, ...]
- Added logging to all MCP tool calls with stack info
- Updated server instructions to document nesting and abort
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add `from None` to raise in except clause (B904) - Remove unused variables in tests (F841) - Rename unused loop variable to underscore prefix (B007) - Apply ruff formatting to 14 files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace flake-utils with uv2nix/pyproject-nix for proper Python dependency management in Nix. This provides hermetic builds directly from uv.lock and supports editable installs for development. Key changes: - Use uv2nix to generate Python package set from uv.lock - Add pyproject-build-systems for build dependency resolution - Add editables to build-system requires (needed by hatchling for editable wheel builds) - Remove .venv management from shell hook (Nix handles it now) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix quality_gate.py to handle Claude CLI --output-format json wrapper objects by extracting the 'result' field before parsing - Add tests for wrapper object handling with strong comments explaining the mock design - Remove deprecated 'exposed' field from learn step in deepwork_jobs - Add 'learn' workflow to make orphaned step accessible via MCP - Add 'update' workflow to update job for MCP compatibility - Migrate stop_hooks to quality_criteria in update job - Clean up settings.json by removing obsolete Skill permissions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the major architectural changes including: - New MCP server with checkpoint-based workflow execution - Removal of the rules system - Simplified skill generation - New deepwork_jobs steps Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark 0.7.0 as alpha prerelease so that `uv add deepwork` continues to install the stable 0.5.1 by default, requiring explicit version specification for the new alpha. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When multiple workflows are active on the stack concurrently, callers can now pass session_id to finished_step and abort_workflow to target the correct session instead of always operating on the top-of-stack. This prevents logical corruption when sub-agents run workflows in parallel. Changes: - Add optional session_id field to FinishedStepInput and AbortWorkflowInput - Add _resolve_session() helper to StateManager for ID-based lookup - Thread session_id through all StateManager methods and WorkflowTools - Use filter-based stack removal instead of pop() for mid-stack operations - Add session_id parameter to MCP server tool registrations - Add v1.4.0 changelog entry to mcp_interface.md - Add tests for session_id routing in test_state, test_tools, test_async Fully backward compatible — omitting session_id preserves top-of-stack behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _load_all_jobs method was catching ParseError and continuing with no indication of failure, making schema validation errors invisible to users (e.g. get_workflows returning empty with no explanation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a workflow needs to run a multi-step process on many independent items, the define step now guides users to split the repeated process into a separate workflow and fan out via parallel sub-agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update claude-code-test.yml validate-generation job: replace pytest test file with inline Python validation of fruits fixture parsing - Check for deepwork/SKILL.md instead of per-step skills - Update e2e test to use /deepwork skill instead of /deepwork_jobs and /fruits Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The DeepWork MCP server is registered in .mcp.json with the bare "deepwork" command. In CI, deepwork is installed via uv sync and only exists in .venv/bin/, which isn't on PATH. Claude Code fails to start the MCP server subprocess, causing it to fall back to ad-hoc file creation instead of using MCP tools. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous approach overwrote settings.json with only generic permissions (Bash, Read, Write, etc.), which removed the MCP tool permissions (mcp__deepwork__get_workflows, start_workflow, finished_step, abort_workflow) that `deepwork install` had synced. Without these, Claude silently fails to call DeepWork MCP tools and returns empty output. Now merges CI-specific permissions into the existing settings.json, preserving the MCP tool permissions that the install step wrote. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --debug flag to claude invocation to capture detailed logs - Add failure step that dumps debug.log, .mcp.json, settings.json, and session state when the job creation step fails - Instruct Claude to stop after define+implement steps instead of running the full 4-step workflow (test+iterate are unnecessary for CI) - Increase timeout from 6 to 10 minutes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a major architectural shift from skill-file-based workflow execution to a Model Context Protocol (MCP) server that guides agents through workflows via checkpoint calls with quality gate enforcement.
Key changes:
deepwork serve) with three tools:get_workflows,start_workflow,finished_stepabort_workflowcapability/deepworkentry point instead of per-step skillsWhy This Change?
The previous architecture relied heavily on skill files with embedded instructions and rules-based hooks. This had several limitations:
The MCP approach provides:
.deepwork/tmp/Changes by Area
New MCP Module (
src/deepwork/mcp/)server.py- FastMCP server definitiontools.py- MCP tool implementationsstate.py- Workflow session state managementschemas.py- Pydantic models for I/Oquality_gate.py- Quality gate with review agentNew CLI Command
deepwork serve- Starts MCP server (stdio or SSE transport)Updated
deepwork_jobsStandard Jobiterate,errata,test,fix_jobs,fix_settingsdefine,implement,learnstepsRemoved Components
rules_parser.py,rules_queue.py,pattern_matcher.py,rules_check.py)command_executor.py)deepwork_rulesstandard jobcommitandmanual_testsjobsDocumentation
doc/mcp_interface.md- MCP tool referencedoc/reference/calling_claude_in_print_mode.md- Claude CLI subprocess guidedoc/architecture.mdwith Part 4: MCP Server ArchitectureREADME.mdto remove rules referencesTest plan
deepwork install --platform claudein a test projectdeepwork serve/deepworkskilluv run pytest🤖 Generated with Claude Code