Skip to content

Add external runner support for quality gate reviews#206

Open
nhorton wants to merge 3 commits intomainfrom
claude/add-external-runner-mcp-ZNa9C
Open

Add external runner support for quality gate reviews#206
nhorton wants to merge 3 commits intomainfrom
claude/add-external-runner-mcp-ZNa9C

Conversation

@nhorton
Copy link
Contributor

@nhorton nhorton commented Feb 10, 2026

Summary

This PR introduces configurable quality gate review modes to DeepWork, allowing users to choose between Claude CLI subprocess reviews (external runner) and agent self-review via instructions files.

Key Changes

  • New --external-runner CLI option: Added to deepwork serve command with support for "claude" mode. Defaults to None for self-review mode.

  • QualityGate refactoring:

    • Made ClaudeCLI optional (can be None for self-review mode)
    • Added configurable max_inline_files parameter (5 for external runner, 0 for self-review)
    • New build_review_instructions_file() method generates markdown instructions for agent self-review
    • Updated evaluate() to raise error if CLI is required but not provided
  • WorkflowTools dual-path quality gate handling:

    • Self-review mode: Generates review instructions file and returns guidance to agent to spawn a subagent for verification
    • External runner mode: Uses existing Claude CLI subprocess evaluation path
    • Both paths share common review dict and output spec building logic
  • Server configuration:

    • create_server() now accepts external_runner parameter
    • Instantiates QualityGate with appropriate settings based on runner mode
    • Updated .mcp.json to use --external-runner claude by default
  • File I/O: Added aiofiles import for async file writing of review instructions

Implementation Details

  • Self-review instructions include output listings, quality criteria, guidance, and clear evaluation guidelines
  • File embedding strategy is configurable: external runner embeds up to 5 files inline for efficiency, self-review always lists paths only to keep instructions concise
  • Review instructions are written to .deepwork/tmp/quality_review_<session>_<step>.md
  • Agent receives detailed feedback with instructions to spawn a subagent, review findings, fix issues, and retry until all criteria pass
  • Backward compatible: existing code using external runner continues to work unchanged

Testing

Updated unit tests to reflect new initialization behavior and added external_runner parameter to test fixtures.

https://claude.ai/code/session_015Lub1RgLErD6kC6k8cEmSV

…e execution

Introduces --external-runner CLI param to the serve command that controls
how quality gate reviews are executed:

- external_runner=None (default): Agent self-review mode. finished_step
  dumps review instructions to .deepwork/tmp/ and returns guidance for
  the agent to verify its own work via a subagent, then call finished_step
  again with quality_review_override_reason once passing.

- external_runner="claude": Claude CLI subprocess mode (existing behavior).
  Quality reviews are evaluated by spawning Claude as a subprocess.

Also makes the max_inline_files threshold configurable per QualityGate
instance (was hard-coded as MAX_INLINE_FILES=5). Claude subprocess mode
uses 5 (embed up to 5 files inline), self-review mode uses 0 (always
reference files by path so the subagent reads them directly).

The installer now generates .mcp.json with --external-runner claude so
Claude Code users get the subprocess review behavior by default.

https://claude.ai/code/session_015Lub1RgLErD6kC6k8cEmSV
Adds 45 new tests across 4 files covering the external_runner feature:

- TestConfigurableMaxInlineFiles (7 tests): QualityGate constructor with
  max_inline_files=0, 5, 10, None; payload behavior at each threshold
- TestEvaluateWithoutCli (2 tests): evaluate() raises without CLI,
  empty criteria still auto-passes
- TestBuildReviewInstructionsFile (9 tests): file structure, criteria,
  numbered reviews, notes, guidance, per-file listings, path-only mode
- TestExternalRunnerSelfReview (9 tests): NEEDS_WORK status, feedback
  content, instructions file written, criteria in file, path-only refs,
  file naming, override-then-complete flow, skip for reviewless steps,
  notes propagation
- TestExternalRunnerClaude (4 tests): evaluate_reviews called, no
  instructions file written, failing gate feedback, attempt tracking
- TestExternalRunnerInit (3 tests): default None, explicit value, no-gate
- TestClaudeAdapterMCPRegistration (7 tests): creates .mcp.json, includes
  --external-runner claude, full args, idempotent, updates old config,
  preserves other servers
- TestServeExternalRunnerOption (4 tests): default None, claude passthrough,
  invalid choice rejected, help output

https://claude.ai/code/session_015Lub1RgLErD6kC6k8cEmSV
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants