feat(ai): PR 7 — E2E Tests + Fixtures + Config#421
Open
ianwhitedeveloper wants to merge 5 commits intoai-testing-framework-implementation-consolidationfrom
Open
feat(ai): PR 7 — E2E Tests + Fixtures + Config#421ianwhitedeveloper wants to merge 5 commits intoai-testing-framework-implementation-consolidationfrom
ianwhitedeveloper wants to merge 5 commits intoai-testing-framework-implementation-consolidationfrom
Conversation
Base automatically changed from
pr/ai-test-output-cli
to
ai-testing-framework-implementation-consolidation
March 4, 2026 16:58
- Add source/e2e.test.js: two Vitest describe blocks covering the full workflow (runAITests + recordTestOutput) and the --agent-config JSON file-loading flow; uses describe.skipIf, onTestFinished cleanup, and extracted timeout constants - Add sum-function-test.sudo + sum-function-spec.mdc: self-contained fixture that exercises the import/promptUnderTest pipeline without depending on project-specific ai/ rules - Add claude-agent-config.json: fixture for --agent-config file flow - Add vitest.config.e2e.js: dedicated config for npm run test:e2e - Update vitest.config.js: correct stale comment on e2e exclusion - Add test:e2e script to package.json - Update fixtures/README.md: accurate descriptions + rationale for omitting a deterministic failure fixture E2E failure path is proven at unit level (ai-runner.test.js mock agents); deterministic failure fixtures are not viable with capable LLMs as both result and judge agents. Made-with: Cursor
- Add no-prompt-under-test.sudo, missing-user-prompt.sudo, no-assertions.sudo fixtures to trigger extraction validation errors (MISSING_PROMPT_UNDER_TEST, MISSING_USER_PROMPT, NO_ASSERTIONS_FOUND) through the full E2E pipeline - Add sudolang-prompt-test.sudo fixture to verify the framework handles SudoLang syntax in the userPrompt field - Add describe.skipIf blocks for each validation error case and for the SudoLang happy-path scenario - Replace try/catch error capture with Try helper, consistent with ai-runner.test.js and test-extractor.test.js conventions - Update fixtures/README.md with new fixture descriptions Made-with: Cursor
- Replace { name, code } partial assertion shapes with full
error?.cause comparisons for all three validation error tests
(MISSING_PROMPT_UNDER_TEST, MISSING_USER_PROMPT, NO_ASSERTIONS_FOUND)
- Expected objects now include name, message, code, and testFile —
the complete deterministic cause shape from test-extractor.js
- Follows Jan's review principle from PR #409: deterministic
functions should assert the complete expected value, not
individual properties
Made-with: Cursor
- A1: validate userPrompt/assertions before resolveImportPaths so structural errors surface before any filesystem IO - A2: guard CONTEXT section in buildJudgePrompt matching existing buildResultPrompt pattern; add test for empty case - A3: use JSON.stringify for TAP YAML in mock helper, eliminating backtick-in-template fragility Made-with: Cursor
- Rewrite buildExtractionPrompt with explicit EXTRACTION RULES block: agent must return "" / [] for missing fields rather than inferring userPrompt or assertions from import/context - Update MISSING_USER_PROMPT and NO_ASSERTIONS_FOUND messages to correctly attribute failures to the agent, not the test file - Add explicit "return []" fallback to importPaths rule for consistency with rules 1 and 3 - Remove !assertions dead code (parseExtractionResult guarantees an array; only .length === 0 check is needed) - Update e2e expected messages and test-extractor snapshot test to match revised prompt and error messages - buildJudgePrompt now conditionally omits CONTEXT section when promptUnderTest is empty, matching buildResultPrompt pattern - Reorder extractTests validation: structural checks before IO Made-with: Cursor
9506069 to
7da79f3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Part of the PR #394 consolidation effort. Targets
ai-testing-framework-implementation-consolidation.Dependency order: Foundation → Utilities → Parsers → Config + Validation → Core Runner → CLI + Output (#420, merged) → E2E (this PR) → outputFormat +
riteway ai init(#423)What's in this PR
End-to-end test suite + fixture files + dedicated Vitest config for the AI testing framework, plus post-consolidation fixups to
test-extractor.js.source/e2e.test.jsFull E2E test coverage using
describe.skipIf(!isClaudeAuthenticated)to gate all tests on real Claude CLI auth:runAITests+recordTestOutputagainstsum-function-test.sudo; asserts assertion count, pass/fail, run counts, TAP file content and filename format--agent-configfile flow — loadsclaude-agent-config.jsonvialoadAgentConfig, verifies config shape, runs the same fixtureMISSING_PROMPT_UNDER_TEST— fixture with noimportstatementMISSING_USER_PROMPT— fixture with nouserPromptfieldNO_ASSERTIONS_FOUND— fixture with no assertion linesuserPromptAll error-path tests use
Try(not try/catch) and assert the fullerror.causeobject —name,message,code, andtestFile.Fixtures (
source/fixtures/)sum-function-spec.mdcsum-function-test.sudosudolang-prompt-test.sudouserPromptclaude-agent-config.json--agent-configfile-loading flowno-prompt-under-test.sudoMISSING_PROMPT_UNDER_TESTmissing-user-prompt.sudoMISSING_USER_PROMPTno-assertions.sudoNO_ASSERTIONS_FOUNDConfig + scripts
vitest.config.e2e.js— dedicated config (include: ['source/e2e.test.js'],testTimeout: 300000)vitest.config.js— updated comment one2e.test.jsexclusionpackage.json—"test:e2e": "vitest run --config vitest.config.e2e.js"Post-consolidation fixups (
source/test-extractor.js)buildExtractionPromptnow includes explicitEXTRACTION RULESinstructing the agent to return""/[]for missing fields rather than synthesizing auserPromptor extracting assertions from imported file contents. This makesMISSING_USER_PROMPTandNO_ASSERTIONS_FOUNDreliably testable end-to-end.MISSING_USER_PROMPTandNO_ASSERTIONS_FOUNDmessages now correctly say "Extraction agent returned…" rather than "Test file does not…", accurately pointing at the source of truth.userPromptandassertionschecks now fire beforeresolveImportPaths()IO, so structural errors surface before any filesystem reads.buildJudgePromptguard (A2) —CONTEXT (Prompt Under Test)section is now conditionally omitted whenpromptUnderTestis empty, matchingbuildResultPromptpattern.ai-runner.test.jsmock usesJSON.stringify()for consistency withextractionResultandresultText.npm run test:e2erequires an authenticated Claude CLI and must be run locally. These tests do not run in CI by default (noclaudebinary available in the CI environment).A team decision is needed on CI strategy, for example:
Until a decision is made, contributors should run
npm run test:e2elocally before merging PRs that touch the extraction prompt, fixtures, or agent config.Why no deterministic failure fixture?
Deterministic E2E failure tests are not viable with capable LLMs as both result and judge agents — the result agent satisfies requirements from first principles regardless of bad prompt context, and the judge scores the actual output rather than the prompt quality. The failure detection path is fully covered by unit tests with mock agents in
ai-runner.test.js. Seesource/fixtures/README.mdfor the full rationale.Test results