Skip to content

Manual end-to-end testing: MCP server (pred mcp) #94

@GiggleLiu

Description

@GiggleLiu

Context

We added an MCP server to the pred CLI. Automated tests pass, but we need manual end-to-end testing to verify everything works correctly across platforms.

Setup

  1. Build and install:

    make cli
    pred list       # verify it works
  2. Configure your editor:

    Claude Code (recommended):

    claude mcp add --scope user problemreductions -- pred mcp

    Cursor (.cursor/mcp.json):

    { "mcpServers": { "problemreductions": { "command": "pred", "args": ["mcp"] } } }

    Windsurf (~/.codeium/windsurf/mcp_config.json):

    { "mcpServers": { "problemreductions": { "command": "pred", "args": ["mcp"] } } }

    OpenCode (opencode.json):

    { "mcp": { "problemreductions": { "type": "local", "command": ["pred", "mcp"] } } }
  3. Restart your editor / Claude Code

Test 1: Graph exploration

Type each prompt. Verify the AI calls MCP tools and returns meaningful results.

List all available problem types
Show me details about MaximumIndependentSet — what are its variants and size fields?
What problems can MIS reduce to?
What problems can be reduced TO QUBO? Check 2 hops.
Find the cheapest reduction path from MIS to QUBO
Find ALL reduction paths from Satisfiability to SpinGlass
Export the full reduction graph
  • All responses contain correct data
  • The AI calls the right MCP tools automatically

Test 2: Create and solve problems

Create a MaximumIndependentSet problem on a triangle graph (3 vertices, edges 0-1, 1-2, 2-0)
Create a random MIS instance with 8 vertices and edge probability 0.4, then solve it
Create a 3-SAT problem with 4 variables and clauses (1,2,3), (-1,-2,4), (2,-3,-4), then find a satisfying assignment
Create a QUBO with matrix [[1, -0.5], [-0.5, 2]] and solve it
  • All problems are created successfully
  • Solutions are valid
  • ILP solver is used by default (not brute-force)

Test 3: Evaluate configurations

Create MIS on edges 0-1, 1-2, 2-3. Then evaluate the configuration [1, 0, 1, 0] — is it a valid independent set?
For the same problem, evaluate [1, 1, 0, 0] — that should be invalid since vertices 0 and 1 are connected
  • Valid configs return a positive objective value
  • Invalid configs are correctly identified as infeasible

Test 4: Reductions end-to-end

Create a small MIS problem, reduce it to QUBO, then solve the QUBO and map the solution back
Walk me through reducing a 5-vertex MIS instance to SpinGlass step by step
Reduce a KColoring problem (triangle graph, 3 colors) to Satisfiability
  • Reductions complete without errors
  • Solutions mapped back to the source are valid and optimal

Test 5: Prompt templates

The MCP server provides 7 task-oriented prompt templates. Verify each one works:

Use the "what_is" prompt to explain MaximumIndependentSet
Use the "model_my_problem" prompt: "I need to assign frequencies to cell towers so that neighboring towers don't interfere"
Use the "compare" prompt to compare MIS and QUBO
Use the "reduce" prompt to walk through MIS to QUBO reduction
Use the "solve" prompt for a MIS instance with edges 0-1, 1-2, 2-0
Use the "find_reduction" prompt from SAT to QUBO
Use the "overview" prompt to explore the full landscape
  • All 7 prompts are listed and invocable
  • Each prompt triggers the correct sequence of tool calls
  • Responses are well-structured and informative

Test 6: Error handling

Show me details about FooBarProblem
Create an MIS problem but don't provide any edges
  • Error messages are clear and suggest what to do

Expected behavior

  • All responses should contain meaningful, correct data
  • The AI should call the right MCP tools automatically
  • The default solver should be ILP (not brute-force)
  • Solutions should be valid (e.g., independent sets are actually independent)
  • Prompt templates should guide structured workflows
  • Errors should be clear and actionable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions