Skip to content

feat(ai): PR 7 — E2E Tests + Fixtures + Config#421

Open
ianwhitedeveloper wants to merge 5 commits intoai-testing-framework-implementation-consolidationfrom
pr/ai-e2e-fixtures
Open

feat(ai): PR 7 — E2E Tests + Fixtures + Config#421
ianwhitedeveloper wants to merge 5 commits intoai-testing-framework-implementation-consolidationfrom
pr/ai-e2e-fixtures

Conversation

@ianwhitedeveloper
Copy link
Collaborator

@ianwhitedeveloper ianwhitedeveloper commented Feb 27, 2026

Context

Part of the PR #394 consolidation effort. Targets ai-testing-framework-implementation-consolidation.

Dependency order: Foundation → Utilities → Parsers → Config + Validation → Core Runner → CLI + Output (#420, merged)E2E (this PR) → outputFormat + riteway ai init (#423)

Next: Draft PR #423 (pr/agent-output-format) — outputFormat serialization strategy + riteway ai init eject command — is queued behind this PR.


What's in this PR

End-to-end test suite + fixture files + dedicated Vitest config for the AI testing framework, plus post-consolidation fixups to test-extractor.js.

source/e2e.test.js

Full E2E test coverage using describe.skipIf(!isClaudeAuthenticated) to gate all tests on real Claude CLI auth:

  • Full workflowrunAITests + recordTestOutput against sum-function-test.sudo; asserts assertion count, pass/fail, run counts, TAP file content and filename format
  • --agent-config file flow — loads claude-agent-config.json via loadAgentConfig, verifies config shape, runs the same fixture
  • Validation error tests (extraction-only, faster than full workflow):
    • MISSING_PROMPT_UNDER_TEST — fixture with no import statement
    • MISSING_USER_PROMPT — fixture with no userPrompt field
    • NO_ASSERTIONS_FOUND — fixture with no assertion lines
  • SudoLang userPrompt — happy-path test proving the framework handles SudoLang syntax in userPrompt

All error-path tests use Try (not try/catch) and assert the full error.cause object — name, message, code, and testFile.

Fixtures (source/fixtures/)

File Purpose
sum-function-spec.mdc Self-contained prompt-under-test spec (no project-specific ai/ dependency)
sum-function-test.sudo Primary happy-path fixture (3 assertions)
sudolang-prompt-test.sudo Happy-path fixture with SudoLang userPrompt
claude-agent-config.json Valid agent config for --agent-config file-loading flow
no-prompt-under-test.sudo Triggers MISSING_PROMPT_UNDER_TEST
missing-user-prompt.sudo Triggers MISSING_USER_PROMPT
no-assertions.sudo Triggers NO_ASSERTIONS_FOUND

Config + scripts

  • vitest.config.e2e.js — dedicated config (include: ['source/e2e.test.js'], testTimeout: 300000)
  • vitest.config.js — updated comment on e2e.test.js exclusion
  • package.json"test:e2e": "vitest run --config vitest.config.e2e.js"

Post-consolidation fixups (source/test-extractor.js)

  • Non-inferring extraction promptbuildExtractionPrompt now includes explicit EXTRACTION RULES instructing the agent to return "" / [] for missing fields rather than synthesizing a userPrompt or extracting assertions from imported file contents. This makes MISSING_USER_PROMPT and NO_ASSERTIONS_FOUND reliably testable end-to-end.
  • Agent-attributed error messagesMISSING_USER_PROMPT and NO_ASSERTIONS_FOUND messages now correctly say "Extraction agent returned…" rather than "Test file does not…", accurately pointing at the source of truth.
  • Validation reorder (A1)userPrompt and assertions checks now fire before resolveImportPaths() IO, so structural errors surface before any filesystem reads.
  • buildJudgePrompt guard (A2)CONTEXT (Prompt Under Test) section is now conditionally omitted when promptUnderTest is empty, matching buildResultPrompt pattern.
  • Mock consistency (A3) — TAP YAML embed in ai-runner.test.js mock uses JSON.stringify() for consistency with extractionResult and resultText.

⚠️ E2E tests: local-only, team decision needed

npm run test:e2e requires an authenticated Claude CLI and must be run locally. These tests do not run in CI by default (no claude binary available in the CI environment).

A team decision is needed on CI strategy, for example:

  • Skip e2e in CI permanently (unit coverage is sufficient for most paths)
  • Run e2e in CI with a secrets-injected Claude API key on a scheduled basis
  • Gate e2e on a manual workflow trigger

Until a decision is made, contributors should run npm run test:e2e locally before merging PRs that touch the extraction prompt, fixtures, or agent config.


Why no deterministic failure fixture?

Deterministic E2E failure tests are not viable with capable LLMs as both result and judge agents — the result agent satisfies requirements from first principles regardless of bad prompt context, and the judge scores the actual output rather than the prompt quality. The failure detection path is fully covered by unit tests with mock agents in ai-runner.test.js. See source/fixtures/README.md for the full rationale.


Test results

npm test         → 190 tests passing
npm run lint     → Lint complete.
npm run ts       → TypeScript check complete.
npm run test:e2e → 6/6 passing (requires Claude auth)

Base automatically changed from pr/ai-test-output-cli to ai-testing-framework-implementation-consolidation March 4, 2026 16:58
- Add source/e2e.test.js: two Vitest describe blocks covering
  the full workflow (runAITests + recordTestOutput) and the
  --agent-config JSON file-loading flow; uses describe.skipIf,
  onTestFinished cleanup, and extracted timeout constants
- Add sum-function-test.sudo + sum-function-spec.mdc: self-contained
  fixture that exercises the import/promptUnderTest pipeline without
  depending on project-specific ai/ rules
- Add claude-agent-config.json: fixture for --agent-config file flow
- Add vitest.config.e2e.js: dedicated config for npm run test:e2e
- Update vitest.config.js: correct stale comment on e2e exclusion
- Add test:e2e script to package.json
- Update fixtures/README.md: accurate descriptions + rationale for
  omitting a deterministic failure fixture

E2E failure path is proven at unit level (ai-runner.test.js mock
agents); deterministic failure fixtures are not viable with capable
LLMs as both result and judge agents.

Made-with: Cursor
- Add no-prompt-under-test.sudo, missing-user-prompt.sudo,
  no-assertions.sudo fixtures to trigger extraction validation
  errors (MISSING_PROMPT_UNDER_TEST, MISSING_USER_PROMPT,
  NO_ASSERTIONS_FOUND) through the full E2E pipeline
- Add sudolang-prompt-test.sudo fixture to verify the framework
  handles SudoLang syntax in the userPrompt field
- Add describe.skipIf blocks for each validation error case and
  for the SudoLang happy-path scenario
- Replace try/catch error capture with Try helper, consistent
  with ai-runner.test.js and test-extractor.test.js conventions
- Update fixtures/README.md with new fixture descriptions

Made-with: Cursor
- Replace { name, code } partial assertion shapes with full
  error?.cause comparisons for all three validation error tests
  (MISSING_PROMPT_UNDER_TEST, MISSING_USER_PROMPT, NO_ASSERTIONS_FOUND)
- Expected objects now include name, message, code, and testFile —
  the complete deterministic cause shape from test-extractor.js
- Follows Jan's review principle from PR #409: deterministic
  functions should assert the complete expected value, not
  individual properties

Made-with: Cursor
- A1: validate userPrompt/assertions before resolveImportPaths
  so structural errors surface before any filesystem IO
- A2: guard CONTEXT section in buildJudgePrompt matching
  existing buildResultPrompt pattern; add test for empty case
- A3: use JSON.stringify for TAP YAML in mock helper,
  eliminating backtick-in-template fragility

Made-with: Cursor
- Rewrite buildExtractionPrompt with explicit EXTRACTION RULES
  block: agent must return "" / [] for missing fields rather than
  inferring userPrompt or assertions from import/context
- Update MISSING_USER_PROMPT and NO_ASSERTIONS_FOUND messages to
  correctly attribute failures to the agent, not the test file
- Add explicit "return []" fallback to importPaths rule for
  consistency with rules 1 and 3
- Remove !assertions dead code (parseExtractionResult guarantees
  an array; only .length === 0 check is needed)
- Update e2e expected messages and test-extractor snapshot test
  to match revised prompt and error messages
- buildJudgePrompt now conditionally omits CONTEXT section when
  promptUnderTest is empty, matching buildResultPrompt pattern
- Reorder extractTests validation: structural checks before IO

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant