Skip to content

V0.3.3/housekeeping#7

Merged
gimlichael merged 49 commits intomainfrom
v0.3.3/housekeeping
Mar 25, 2026
Merged

V0.3.3/housekeeping#7
gimlichael merged 49 commits intomainfrom
v0.3.3/housekeeping

Conversation

@gimlichael
Copy link
Copy Markdown
Member

Expand the classification system to include release-adjacent work, strengthen validator assertions, and update documentation to reflect new guidance. Improve the execution flow for identity-sensitive commits and ensure that skills maintain alignment with eval contracts. This release also includes visual enhancements for skill documentation and clarifications on mandatory checkpoints.

Expands the classification system to explicitly account for release-adjacent work that spans multiple intents (dependency baselines, package metadata, doc publishing, community health, CI automation). Adds corresponding eval case to verify the grouping behavior respects semantic intent over same-round timing.
Updates the skill template validator to verify that the release-adjacent splitting rule and temporal proximity guidance are present in the git-visual-commits SKILL.md file.
Documents the new release-adjacent work splitting behavior in the Available Skills section for git-visual-commits, highlighting that same-round timing does not override semantic intent when grouping changes.
Strengthen git-visual-commits so identity-sensitive commits prefer direct git execution, fail fast when a wrapper cannot honor aliases, and recover conservatively after a wrong-author attempt. Adds matching eval coverage for wrapper failure and repair guidance.
Extend the repo validator so git-visual-commits must keep the new direct execution, fail-fast pivoting, and non-destructive recovery guidance in sync with its eval contract.
Document the stronger git-visual-commits safeguards so the README reflects direct git execution for bot identity, immediate tool-path pivots after wrong-author commits, and conservative recovery guidance.
Add eval contract definitions and expand skill README coverage to reflect new per-skill evaluation discipline. Clarify the three-part skill anatomy: SKILL.md instructions, FORMS.md parameter forms, and evals/evals.json test suites.
Reorganize SKILL.md with clarified form handling, parameter collection flow, and prose wrapping discipline. Add FORMS.md for structured pending-changes intake during release drafts. Define core eval suite with five test cases covering branch-hint inference, commitment body parsing, SemVer classification, prose wrapping, and pending-change handling.
Reorganize SKILL.md workflow with explicit step numbering, clarified semantic intent classification, and documentation separation rules. Define six-eval suite covering plan review, bot identity, auto-approval, grouping validation, body discipline, and post-commit verification. Add enforcement of emoji/prefix reference contract and post-commit author identity validation.
Reorganize SKILL.md with clarified squash scope, commit language preservation, and noise filtering strategy. Define five-eval suite covering default full-branch squashing, emoji/prefix consistency, technical identifier preservation, overlap detection, and compatibility with git-visual-commits wording style. Add enforcement of emoji/prefix reference sync contract.
Reorganize SKILL.md with explicit step numbering, clarified semantic intent classification, and documentation separation rules. Define six-eval suite covering plan review, bot identity, auto-approval, grouping validation, body discipline, and post-commit verification. Add enforcement of emoji/prefix reference contract and post-commit author identity validation.
Reorganize SKILL.md with clarified squash scope, commit language preservation, and noise filtering strategy. Define five-eval suite covering default full-branch squashing, emoji/prefix consistency, technical identifier preservation, overlap detection, and compatibility with git-visual-commits wording style. Add enforcement of emoji/prefix reference sync contract.
Extend the repo validator so git-visual-commits and git-visual-squash-summary must keep the new direct execution, fail-fast pivoting, non-destructive recovery guidance, and emoji/prefix reference sync in sync with their eval contracts.
Document the three-part skill anatomy: SKILL.md instructions, FORMS.md parameter forms, and evals/evals.json test suites. Expand skill README coverage to reflect new per-skill evaluation discipline and stronger semantic grouping, direct bot-path execution, and conservative commit-repair workflow.
This is a patch release focused on strengthening git-visual-commits and git-visual-squash-summary with clarified prefix behavior, safer bot-identity paths, concrete eval contracts, and aligned validator enforcement. Prefix handling now defaults to emoji-first subjects, with conventional-commit prefixes allowed only when explicitly requested. Includes six eval cases for git-visual-commits and five for git-visual-squash-summary covering identity modes, emoji/prefix consistency, and reference synchronization.
Refine emoji meanings for 💬 (community health, changelog, release communication), 📝 (documentation that is not primarily repo-health focused), 📚 (high-level docs with deferred to 💬 when appropriate), and 📦 (package metadata and release notes). Keeps the reference byte-for-byte aligned across git-visual-commits and git-visual-squash-summary.
Add clarifications to README about required disciplines in both git-visual-commits and git-visual-squash-summary workflows: enforce 💬 for community-health and changelog communication, require approval before committing when yolo is absent, document package release-notes as 📦 work separate from general docs, lock scope assumption to full worktree default, and clarify bare invocation expectations for squash summaries.
Tighten critical rules: enforce identity-lock (never silently downgrade bot to human), add fail-fast tool validation before first commit, guard auto-approval skips confirmation only (never skips semantic grouping or verification), enforce default scope as full worktree rather than guessed subset, require user approval unless yolo or auto-mode is active, lock commit-language reference reading before choosing emoji, validate post-commit author and body prose format, reject umbrella commits spanning multiple skill/scaffold/tooling/docs categories.
Add two new eval entries (ids 17 and 18) covering: review-gated commit flow that requires approval before committing when yolo or auto are absent, and umbrella commit rejection for multi-intent diffs spanning build-system, CI automation, package metadata, and release communication work.
Tighten critical rules: enforce bare invocation to auto-resolve current branch scope without asking 'what do you want me to summarize', simplify the algorithm to whole-branch by default rather than commit selection questions, preserve meaningful change groups instead of forcing umbrella lines, and clarify non-mutating design keeps git state untouched.
Add test entry covering bare invocation expectation: calling git-visual-squash-summary directly should auto-resolve current branch scope and return grouped lines without 'what do you want me to summarize' question.
Add validation logic to ensure both git-visual-commits and git-visual-squash-summary keep their commit-language.md references byte-for-byte synchronized as required by the skill contract. Enforces that emoji meanings, prefix definitions, and rollback syntax remain identical across the two skills.
Dependency and version baseline changes were being absorbed into generic build-system or refactor lines. Added explicit rules to treat them as their own semantic intent, prefer dependency-oriented emoji such as ⬆️ for retained dependency lines, and flag collapsing dependency updates into build-system work as a bad output characteristic.
New eval (id 11) validates that when a branch commit collapses dependency updates and build-system changes into one subject, the skill correctly retains them as separate grouped lines and uses a dependency-oriented emoji for the dependency line.
Four gaps surfaced from real benchmarking sessions: eval directories without the eval-* prefix silently produced zero runs from aggregate_benchmark.py; Codex CLI on Windows required a smoke-run validation and safe prompt passing to avoid argument-parsing failures; convenience output files like last-message.txt were sometimes missing while raw event output was still available; and parity results were being mislabeled as simulated. All four are now addressed in SKILL.md and the two reference files.
New eval (id 7) covers the combined scenario where Codex CLI on Windows PowerShell produces argument errors, missing last-message.txt, and aggregator failures due to a bare directory name. Validates that the skill keeps the run measured, advises safe prompt passing, requires the eval-* prefix, uses raw event output as fallback, and treats parity as an honest measured result.
Added two new capability bullets: dependency updates keeping their own line in the squash-summary entry, and PowerShell-safe plus Codex-friendly benchmarking in the skill-creator-agnostic entry.
Finalize release 0.3.3 with expanded changelog entry that covers all git workflow skills, eval contracts, validator enhancements, and documentation updates. Update README skill summaries to reflect new eval discipline and documentation structure.
…stic

Add comprehensive test coverage for markdown-illustrator (5 test cases for visualization-first document analysis and visual brief generation) and skill-creator-agnostic (7 test cases for skill evaluation infrastructure, Codex CLI integration, and cross-runner parity validation).
…tor-agnostic

Improve skill documentation with clearer workflow guidance, comprehensive parameter documentation, and expanded references. Add Windows PowerShell benchmarking guidance and benchmark contract specification to support cross-runner compatibility and deterministic skill validation per AGENTS.md requirements.
Refactor the git-keep-a-changelog skill to explicitly establish the Step 3 pending-worktree confirmation as a mandatory checkpoint that cannot be bypassed by user request. Add new 'Mandatory Checkpoints' and 'User Intent vs. Mandatory Gates' sections to SKILL.md that clarify the distinction between optional refinements and required safety gates. Update FORMS.md to emphasize mandatory gating, and extend evals/evals.json with a new test case that validates the gate is never skipped when the user says 'include all changes'.
Extend the skill-template validation script with new assertions that enforce the mandatory pending-worktree gate documentation in git-keep-a-changelog. Validate that SKILL.md includes 'Mandatory Checkpoints' and 'User Intent vs. Mandatory Gates' sections, that FORMS.md documents the gate as mandatory, and that evals/evals.json includes test expectations for gate enforcement under user shortcuts.
Update the Available Skills table in README.md to reflect that the git-keep-a-changelog skill's pending-worktree confirmation is now mandatory for concrete releases. Change wording from 'can ask a concise Yes / No / Custom question' to 'must ask a mandatory Yes / No / Custom confirmation question' to accurately describe the skill's contract in repo-health communication.
Refactor git-visual-commits skill to explicitly state that references/commit-language.md should be resolved from the bundled skill resource path first, not as a repo-root path. This clarifies behavior for agents that run the skill outside the target repository. Add new eval test case (id 19) validating that missing repo-root references/ folder does not block the skill.
Extend validate-skill-templates.ps1 with new assertions to enforce that git-visual-commits skill documents commit-language reference resolution from bundled skill resources. Validates SKILL.md includes bundled path guidance and evals/evals.json covers behavior when repo lacks top-level references/ folder.
Update README.md and CHANGELOG.md to reflect the clarified git-visual-commits behavior: commit-language reference is resolved from bundled skill resources first, not repo-root paths. This aligns documentation with the skill's actual deployment behavior across agents and environments.
Incorporate all committed changes plus housekeeping updates to README, validator, skill documentation, and eval coverage. Finalize release heading with today's date (2026-03-25) and document compare-link footer strengthening across the changelog release highlight, Changed section, and Fixed section.
Update the skill description to explicitly document that the skill maintains or inserts the Keep a Changelog compare-link footer on both create and update paths, reflecting the strengthened footer rules now enforced in SKILL.md and validator.
Strengthen SKILL.md documentation of compare-link footer as a non-negotiable rule and Step 7 requirement, ensuring it is maintained, inserted, repaired, or verified on every edit path. Add eval test case to validate footer insertion behavior when missing from existing changelog.
Extend the skill-template validation script with new assertions to enforce compare-link footer documentation in git-keep-a-changelog. Validate that SKILL.md documents footer maintenance as both a non-negotiable rule and Step 7 requirement, and that evals include test coverage for missing-footer insertion behavior.
Updated SKILL.md with hero image reference and added assets directory with skill hero image. Enhances visual appeal and aligns with skill documentation standards.
Added assets directory with hero image for git-keep-a-changelog skill. Updated SKILL.md with asset reference. Improves visual documentation and consistency across skills.
Added assets directory with hero image for git-nuget-readme skill. Completes visual asset set across all git workflow skills.
Added assets directory with hero image for git-nuget-release-notes skill. Updated SKILL.md with asset reference. Strengthens documentation consistency.
Added assets directory with hero image for git-visual-commits skill. Updated SKILL.md with restructured workflow documentation and asset reference for enhanced visual presentation.
Added assets directory with hero image for git-visual-squash-summary skill. Updated SKILL.md with asset reference. Completes visual asset set for git workflow skills.
Copilot AI review requested due to automatic review settings March 25, 2026 21:48
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Enhance skill documentation, validation, and eval contracts for emoji-first defaults and release-adjacent workflows

✨ Enhancement 🧪 Tests 📝 Documentation

Grey Divider

Walkthroughs

Description
• **Emoji-first commit language defaults**: Shifted git-visual-commits and
  git-visual-squash-summary to use emoji-only subjects by default, with conventional-commit prefixes
  available only on explicit user request
• **Release-adjacent work classification**: Expanded commit classification to include
  dependency/version baselines, package/publish metadata, documentation publishing, and community
  health/release communication with concrete splitting rules and examples
• **Mandatory pending-worktree confirmation gate**: Added Step 3 confirmation gate to
  git-keep-a-changelog with FORMS.md structured input support, preventing silent inclusion of
  uncommitted changes
• **Strengthened validator assertions**: Added 35+ new assertions across three git skills to enforce
  release-adjacent classification, direct git execution rules, fail-fast tool validation, recovery
  safety, and reference handling
• **Enhanced skill documentation**: Added hero images, expanded critical rules sections, and
  clarified identity-sensitive execution flows for bot commits and approval gating
• **Comprehensive eval contracts**: Added 23 new eval test cases covering emoji-first defaults,
  whole-branch scope resolution, release-adjacent splitting, tool-path failures, mandatory gates, and
  footer maintenance
• **Improved benchmark guidance**: Enhanced skill-creator-agnostic with parallel execution
  patterns, Windows UTF-8 handling, eval directory naming requirements, and Codex CLI troubleshooting
• **Visual enhancements**: Added selective accent color guidance for whiteboard/blackboard styles
  and hero images to multiple skills
Diagram
flowchart LR
  A["Emoji-First<br/>Defaults"] --> B["git-visual-commits<br/>& squash-summary"]
  C["Release-Adjacent<br/>Classification"] --> B
  D["Mandatory<br/>Confirmation Gate"] --> E["git-keep-a-changelog"]
  F["Validator<br/>Assertions"] --> G["Skill Enforcement"]
  B --> H["Enhanced<br/>Documentation"]
  E --> H
  I["Eval Test<br/>Cases"] --> J["Comprehensive<br/>Coverage"]
  K["Parallel<br/>Execution"] --> L["skill-creator-agnostic"]
  H --> M["v0.3.3 Release"]
  J --> M
  L --> M
Loading

Grey Divider

File Changes

1. scripts/validate-skill-templates.ps1 🧪 Tests +66/-2

Strengthen skill validators for release-adjacent and identity-aware workflows

• Added 21 new validator assertions for git-visual-commits skill to enforce release-adjacent work
 classification, direct git execution rules, fail-fast tool validation, recovery safety, and
 commit-language reference handling
• Added 8 new validator assertions for git-visual-squash-summary skill to enforce emoji-first
 defaults, bare invocation handling, and changelog/release-status communication preferences
• Added 4 new validator assertions for git-keep-a-changelog skill to enforce mandatory Step 3
 confirmation gate and compare-link footer maintenance
• Added 2 new validator assertions for git-keep-a-changelog/FORMS.md to validate the mandatory
 pending-worktree confirmation form structure

scripts/validate-skill-templates.ps1


2. skills/git-visual-commits/SKILL.md ✨ Enhancement +164/-47

Enhance git-visual-commits with emoji-first defaults and release-adjacent splitting

• Rewrote skill description to emphasize emoji-first subjects, optional conventional prefixes only
 on explicit request, and bundled commit-language.md validation from skill resource paths
• Added hero image and expanded critical rules with new sections: Direct Git Execution Rule,
 Fail-Fast Tool Validation, Recovery Safety Rule, Approval and Clarification Lock, and Commit
 Language Lock
• Restructured commit message format to show default <emoji> <description> form first, then
 optional conventional-commit combo only when explicitly requested
• Expanded Step 2 classification to include release-adjacent categories: Dependency/version
 baselines, Package/publish metadata, Documentation publishing, and Community health/release
 communication
• Added release-adjacent splitting rule with concrete examples showing how to separate build-system,
 CI automation, package metadata, and release-communication commits
• Enhanced Step 4 planning guidance to validate emoji/prefix against inspected reference before
 presenting plan and clarify that without yolo/auto, workflow must stop for approval
• Added Source Discipline section requiring explanations to anchor to actually-inspected sources
 rather than inference

skills/git-visual-commits/SKILL.md


3. skills/git-visual-commits/references/commit-language.md 📝 Documentation +24/-22

Shift commit-language reference to emoji-first with opt-in prefixes

• Changed prefix guidance from "optional" to "off by default" and clarified they are only added when
 user explicitly requests conventional-commit combo
• Updated all emoji table examples to use default prefixless form (e.g., 🎉 begin api project
 instead of 🎉 init: begin api project)
• Expanded 💬 emoji definition to explicitly cover community health, changelog, release-status
 communication, and human-facing repo messaging
• Expanded 📦 emoji definition to include package release-note metadata
• Updated 📚 emoji guidance to recommend 💬 instead when change is mainly changelog or
 community-health communication
• Added note that examples use default no-prefix form and only switch to prefixed form when user
 explicitly requests combo

skills/git-visual-commits/references/commit-language.md


View more (18)
4. skills/git-visual-squash-summary/SKILL.md ✨ Enhancement +35/-12

Refactor squash-summary skill for whole-branch defaults and emoji-first output

• Rewrote skill description to emphasize whole-branch-by-default scope, bare invocation handling,
 and avoiding needless commit-selection UI for ordinary squash requests
• Added hero image and clarified that skill has one job: produce ready-to-paste squash summary for
 full current feature branch
• Updated non-negotiable rules to specify full feature branch from merge-base to HEAD as default
 and bare invocation as complete request without follow-up questions
• Restructured Step 1 to resolve commit set automatically without user-facing choice, treating
 dependency updates and release-finalization commits as branch-in-scope by default
• Enhanced Step 2 to treat dependency/version changes as separate semantic intent and include late
 changelog/release commits in scope before semantic collapsing
• Updated output format guidance to default to emoji-only lines and only use prefixes when user
 explicitly requested conventional-commit combo
• Added guidance to prefer 💬 for changelog/release-status lines and dependency emojis for
 version-baseline work
• Expanded bad output characteristics to call out commit-selection widgets and collapsing dependency
 updates into generic build/refactor lines

skills/git-visual-squash-summary/SKILL.md


5. skills/git-visual-squash-summary/references/commit-language.md 📝 Documentation +24/-22

Align squash-summary commit-language reference with emoji-first defaults

• Changed prefix guidance from "optional" to "off by default" with same messaging as
 git-visual-commits reference
• Updated all emoji table examples to use default prefixless form
• Expanded 💬 and 📦 emoji definitions to match git-visual-commits reference
• Updated 📚 emoji guidance to recommend 💬 for changelog/community-health communication
• Added note that examples use default no-prefix form

skills/git-visual-squash-summary/references/commit-language.md


6. skills/git-keep-a-changelog/SKILL.md ✨ Enhancement +84/-7

Add mandatory pending-worktree confirmation gate to changelog skill

• Added hero image and reference to FORMS.md for native structured input support with plain-text
 fallback
• Added Mandatory Checkpoints section explicitly stating Step 3 confirmation gate cannot be skipped
 and release highlight/bullet punctuation are required
• Added User Intent vs. Mandatory Gates section explaining that user intent can refine scope after
 gate but cannot bypass it, with rationale for safety
• Restructured workflow steps to insert new Step 3 as mandatory confirmation gate for pending
 worktree changes before proceeding to history reading
• Added detailed Step 3 implementation with Yes / No / Custom prompt, helper commands, and
 guidance on widget vs. plain-text parity
• Updated Step 4+ numbering and added guidance to inspect pending worktree deltas when user approved
 them
• Enhanced compare-link footer maintenance to verify footer exists on every edit and repair
 missing/incomplete links
• Expanded eval expectations to include mandatory gate treatment and compare-link footer insertion

skills/git-keep-a-changelog/SKILL.md


7. skills/git-keep-a-changelog/FORMS.md ✨ Enhancement +58/-0

Add FORMS.md with pending-worktree confirmation gate definition

• Added new form definition for mandatory Step 3 confirmation gate with Yes / No / Custom choices
• Form presents pending-worktree change counts (staged, unstaged, untracked) and asks for explicit
 confirmation before including in changelog draft
• Includes guidance that this gate cannot be skipped for concrete releases and must use identical
 semantics in both widget and plain-text paths

skills/git-keep-a-changelog/FORMS.md


8. README.md 📝 Documentation +30/-9

Update README with emoji-first, release-adjacent, and mandatory-gate enhancements

• Updated benchmark workflow guidance to mention parallel executor and grader fan-out when runner
 supports sub-agents
• Enhanced git-visual-commits description to emphasize emoji-first default, opt-in conventional
 prefixes, direct git execution for bot identity, clarification-before-correction safety, and
 reference-validated emoji choices
• Enhanced git-keep-a-changelog description to highlight mandatory pending-worktree confirmation
 gate, FORMS.md support, and compare-link footer maintenance
• Enhanced git-visual-squash-summary description to emphasize whole-branch-by-default scope, bare
 invocation handling, and avoiding commit-selection UI
• Added new feature bullets for git-visual-commits covering same-round edits, release-adjacent
 splitting, package release notes handling, tool-path failures, and recovery conservatism
• Added new feature bullets for git-visual-squash-summary covering whole-branch default, bare
 invocation, dependency-update separation, and late release-prep inclusion
• Updated markdown-illustrator guidance to mention selective colored accents for
 whiteboard/blackboard styles
• Updated skill-creator-agnostic description to mention Codex CLI support and parallel benchmark
 runs

README.md


9. skills/git-visual-commits/evals/evals.json 🧪 Tests +112/-0

Add comprehensive evals for release-adjacent and identity-aware commit workflows

• Added 10 new eval cases (IDs 10-19) covering release-adjacent splitting, tool-path failures,
 clarification-before-correction, dependency emoji selection, source-discipline explanations, skill
 refactor classification, conventional-prefix opt-in, approval-gating without yolo, complex
 release-adjacent diffs, and bundled reference resolution
• Each eval specifies expected output and detailed expectations for proper handling of semantic
 intent splitting, tool validation, user feedback, reference inspection, and mandatory gates

skills/git-visual-commits/evals/evals.json


10. skills/git-visual-squash-summary/evals/evals.json 🧪 Tests +68/-1

Add evals for whole-branch squash-summary and emoji-first output defaults

• Added 6 new eval cases (IDs 6-11) covering whole-branch-by-default scope, bare invocation
 handling, emoji-first defaults, and dependency-update separation
• Each eval specifies expected output and detailed expectations for automatic scope resolution,
 avoiding commit-selection UI, prefixless output, and semantic intent preservation

skills/git-visual-squash-summary/evals/evals.json


11. skills/skill-creator-agnostic/references/benchmark-contract.md 📝 Documentation +8/-0

Enhance benchmark contract with parallel execution and discovery guidance

• Added requirement that eval directories must begin with eval- prefix for Anthropic aggregation
 tooling discovery
• Added guidance to prefer parallel sub-agent or background-task execution for paired executor and
 grader runs when runner supports it
• Clarified that measured runs with zero delta are not simulated and that real event
 streams/transcripts can recover final messages when convenience files are missing

skills/skill-creator-agnostic/references/benchmark-contract.md


12. skills/skill-creator-agnostic/SKILL.md 📝 Documentation +16/-1

Parallel execution, eval naming, and parity validation guidance

• Added guidance for using sub-agents and parallel execution for MEASURED benchmarks when runner
 supports it
• Clarified eval directory naming requirement with eval- prefix for aggregation tooling discovery
• Enhanced documentation on prompt passing safety, raw event output fallback, and parallel grading
 patterns
• Added clarification that measured parity results are valid outcomes, not reasons to relabel as
 simulated
• Strengthened bad output characteristics section with warnings about directory naming, serial
 execution habits, and parity relabeling

skills/skill-creator-agnostic/SKILL.md


13. skills/git-keep-a-changelog/evals/evals.json 🧪 Tests +59/-0

Pending worktree gate and footer maintenance eval contracts

• Added five new eval test cases (IDs 6-10) covering pending worktree change detection and
 confirmation gates
• Test cases validate mandatory checkpoint enforcement, custom scope narrowing, FORMS-backed widget
 integration, and gate bypass prevention
• Added eval for compare-link footer repair on update paths with existing changelog history
• Tests ensure pending changes cannot be silently included and footer maintenance is verified on all
 edit paths

skills/git-keep-a-changelog/evals/evals.json


14. CHANGELOG.md 📝 Documentation +26/-1

Release 0.3.3 with skill validation and eval contracts

• Released version 0.3.3 with comprehensive skill enhancements and validator strengthening
• Documented tightened classification rules, hardened bot-commit execution, and refined emoji/prefix
 behavior across git skills
• Added concrete eval contracts for all git workflow skills with test coverage for semantic intent,
 identity handling, and safety gates
• Enhanced skill-creator-agnostic with Windows benchmarking, Codex CLI guidance, and parity
 validation documentation
• Updated version links to reflect 0.3.3 release and new unreleased baseline

CHANGELOG.md


15. skills/skill-creator-agnostic/evals/evals.json 🧪 Tests +23/-0

Parallel execution and Windows CLI troubleshooting evals

• Added eval test case (ID 8) for parallel execution preference when runner supports sub-agents
• Added eval test case (ID 7) for Windows Codex CLI troubleshooting covering prompt passing, eval
 directory naming, and event output fallback
• Tests validate that parallel paired runs are preferred over serial execution and that measured
 parity remains valid
• Ensures runner-agnostic workflow handles Windows-specific CLI quirks and artifact discovery issues

skills/skill-creator-agnostic/evals/evals.json


16. skills/git-keep-a-changelog/FORMS.md 📝 Documentation +58/-0

Pending worktree confirmation form with widget and fallback modes

• New file defining structured input form for pending worktree change confirmation gate
• Specifies single-choice field for include_pending_changes with Yes/No/Custom options and dynamic
 prompt values
• Defines conditional custom_pending_scope text field for narrowing pending change inclusion
• Includes presentation rules for native widget controls with plain-text fallback, field ordering,
 and confirmation summary
• Emphasizes mandatory gate enforcement that cannot be bypassed by user intent

skills/git-keep-a-changelog/FORMS.md


17. skills/markdown-illustrator/evals/evals.json 🧪 Tests +30/-0

Whiteboard and blackboard accent color eval contracts

• Added two new eval test cases (IDs 4-5) for blackboard and whiteboard medium preservation with
 accent colors
• Tests validate direct chat responses with Visual Brief and final prompt containing selective
 accent color usage
• Ensures colored accents are used for emphasis marks while keeping base medium monochrome (chalk or
 marker)
• Verifies skill does not ask follow-up questions before producing results

skills/markdown-illustrator/evals/evals.json


18. skills/skill-creator-agnostic/references/windows-powershell-benchmarking.md 📝 Documentation +29/-0

Windows UTF-8, prompt passing, and event output guidance

• Added guidance for forcing UTF-8 mode in Python child processes via PYTHONUTF8 environment
 variable
• Added section on safe CLI prompt passing covering argument arrays, stdin, and smoke-test
 validation before long benchmarks
• Added section on preserving raw JSONL/event-stream output as fallback when convenience files are
 missing
• Enhanced common failure symptoms with eval directory prefix requirement, emoji UTF-8 handling, and
 prompt parsing issues

skills/skill-creator-agnostic/references/windows-powershell-benchmarking.md


19. skills/markdown-illustrator/SKILL.md 📝 Documentation +2/-0

Whiteboard and blackboard accent color treatment clarification

• Clarified visual treatment guidance for whiteboard and blackboard styles to use selective accent
 colors instead of monochrome
• Added compiler rule for whiteboard/blackboard treatments to describe accent colors explicitly
 while maintaining medium authenticity
• Specifies restrained accent palette (red, blue, yellow, green) for emphasis marks like arrows,
 checkmarks, and callouts

skills/markdown-illustrator/SKILL.md


20. skills/dotnet-strong-name-signing/SKILL.md 📝 Documentation +7/-5

Hero image addition and formatting cleanup

• Added hero image reference ![Strong Name Signing](assets/hero.jpg) after main heading
• Fixed whitespace formatting in "Where:" section for consistency and readability

skills/dotnet-strong-name-signing/SKILL.md


21. skills/git-nuget-release-notes/SKILL.md 📝 Documentation +2/-0

Hero image addition for visual branding

• Added hero image reference ![Git NuGet Release Notes](assets/hero.jpg) after main heading

skills/git-nuget-release-notes/SKILL.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Mar 25, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (1) 📎 Requirement gaps (0) 📐 Spec deviations (0)

Grey Divider


Action required

1. FORMS.md fields missing default📘 Rule violation ✓ Correctness
Description
skills/git-keep-a-changelog/FORMS.md documents fields without the required default key, and the
text field omits the required choices key. This breaks the required per-field metadata contract
for FORMS definitions and can make the form ambiguous or unparseable across hosts.
Code

skills/git-keep-a-changelog/FORMS.md[R7-22]

+### include_pending_changes
+- **type:** single-choice
+- **prompt:** "I found pending changes not yet committed for release `{release_label}`: `{staged_count}` staged, `{unstaged_count}` unstaged, `{untracked_count}` untracked. Include them in the changelog draft? Yes / No / Custom"
+- **choices:**
+  - Yes
+  - No
+  - Custom
+- **required:** true
+- **description:** `Yes` includes the pending staged, unstaged, and untracked changes in addition to the committed range. `No` uses committed history only. `Custom` lets the user narrow the pending scope.
+
+### custom_pending_scope
+- **type:** text
+- **prompt:** "Which pending changes should I include?"
+- **placeholder:** "e.g. staged only, exclude untracked"
+- **required:** true
+- **description:** Ask this field only when `include_pending_changes` is `Custom`.
Evidence
PR Compliance ID 111419 requires every field definition in FORMS.md to include type, prompt,
choices, default, and required. In the added form, include_pending_changes has no default,
and custom_pending_scope has no choices or default.

Rule 111419: Require complete metadata for every field in FORMS.md
skills/git-keep-a-changelog/FORMS.md[7-22]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`skills/git-keep-a-changelog/FORMS.md` field definitions are missing required metadata keys.
## Issue Context
Per compliance, every field must define `type`, `prompt`, `choices`, `default`, and `required`. For `text` fields, `choices` must still be present (can be an explicit empty value).
## Fix Focus Areas
- skills/git-keep-a-changelog/FORMS.md[7-22]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Frontmatter contains > character 📘 Rule violation ✓ Correctness
Description
skills/dotnet-strong-name-signing/SKILL.md frontmatter uses the folded-scalar indicator >
(description: >), which introduces a > character inside the YAML frontmatter block. This
violates the rule disallowing any `` characters anywhere within SKILL frontmatter.
Code

skills/dotnet-strong-name-signing/SKILL.md[R1-5]

---
name: dotnet-strong-name-signing
-description: >
-  Generate a strong name key (.snk) file for signing .NET assemblies using pure .NET cryptography — no Visual Studio Developer PowerShell or sn.exe required. Works in any terminal. Use this skill when the user wants to create a strong name key, generate an .snk file, sign .NET assemblies, or mentions "strong-name", "snk", "AssemblyOriginatorKeyFile", "SignAssembly", or asks how to sign a .NET library. Also use when scaffolding .NET libraries or NuGet packages that need assembly signing. ALWAYS use this skill when asked to generate or create a strong name key file.
+description: >
+  Generate a strong name key (.snk) file for signing .NET assemblies using pure .NET cryptography — no Visual Studio Developer PowerShell or sn.exe required. Works in any terminal. Use this skill when the user wants to create a strong name key, generate an .snk file, sign .NET assemblies, or mentions "strong-name", "snk", "AssemblyOriginatorKeyFile", "SignAssembly", or asks how to sign a .NET library. Also use when scaffolding .NET libraries or NuGet packages that need assembly signing. ALWAYS use this skill when asked to generate or create a strong name key file.
---
Evidence
PR Compliance ID 111410 prohibits any `` characters anywhere in the YAML frontmatter block. The
updated frontmatter includes description: >, which contains a > character inside the
frontmatter.

Rule 111410: Disallow angle brackets in SKILL.md YAML frontmatter
skills/dotnet-strong-name-signing/SKILL.md[1-5]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The YAML frontmatter includes a `>` character via `description: >`, which is disallowed.
## Issue Context
This rule applies to any `<`/`>` characters anywhere in the SKILL frontmatter block, including YAML scalar indicators.
## Fix Focus Areas
- skills/dotnet-strong-name-signing/SKILL.md[1-5]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@gimlichael gimlichael changed the title Enhance skill documentation and validation for release-adjacent work V0.3.3/housekeeping Mar 25, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 195d80b41e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +69 to +71
"prompt": "Use systems.md and keep it blackboard style. I want the final prompt to feel like a real chalkboard drawing, but use colored accents for emphasis instead of making every arrow and icon white chalk. Answer directly in chat.",
"expected_output": "A direct chat response with one Visual Brief and one final prompt that preserves the blackboard medium while using a restrained accent color palette for emphasis marks.",
"files": ["evals/files/microservices-architecture.md"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match eval filename to the staged fixture in test 4

The new eval prompt asks the model to "Use systems.md", but the only staged fixture for this case is evals/files/microservices-architecture.md. In the benchmark flow, runs read files that are copied from files[]; when the prompt names a different file, the run can fail with a file-not-found path or ignore the intended fixture, causing noisy benchmark failures unrelated to the skill logic.

Useful? React with 👍 / 👎.

Comment on lines +84 to +86
"prompt": "Read onboarding-notes.md and make it whiteboard style. Keep the whiteboard / marker medium, but use colored accents for emphasis instead of drawing every mark in black marker only. Just answer in chat.",
"expected_output": "A direct chat response with one Visual Brief and one final prompt that preserves the whiteboard medium while using selective accent colors for emphasis.",
"files": ["evals/files/transformers-explained.md"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match eval filename to the staged fixture in test 5

This prompt requests onboarding-notes.md, but files[] stages evals/files/transformers-explained.md instead. Because the eval harness supplies the listed fixture paths, this mismatch can make measured runs fail or drift from the intended input document, which undermines the validity of with-skill vs without-skill comparisons.

Useful? React with 👍 / 👎.

Comment on lines 1 to 5
---
name: dotnet-strong-name-signing
description: >
Generate a strong name key (.snk) file for signing .NET assemblies using pure .NET cryptography — no Visual Studio Developer PowerShell or sn.exe required. Works in any terminal. Use this skill when the user wants to create a strong name key, generate an .snk file, sign .NET assemblies, or mentions "strong-name", "snk", "AssemblyOriginatorKeyFile", "SignAssembly", or asks how to sign a .NET library. Also use when scaffolding .NET libraries or NuGet packages that need assembly signing. ALWAYS use this skill when asked to generate or create a strong name key file.
description: >
Generate a strong name key (.snk) file for signing .NET assemblies using pure .NET cryptography — no Visual Studio Developer PowerShell or sn.exe required. Works in any terminal. Use this skill when the user wants to create a strong name key, generate an .snk file, sign .NET assemblies, or mentions "strong-name", "snk", "AssemblyOriginatorKeyFile", "SignAssembly", or asks how to sign a .NET library. Also use when scaffolding .NET libraries or NuGet packages that need assembly signing. ALWAYS use this skill when asked to generate or create a strong name key file.
---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Frontmatter contains > character 📘 Rule violation ✓ Correctness

skills/dotnet-strong-name-signing/SKILL.md frontmatter uses the folded-scalar indicator >
(description: >), which introduces a > character inside the YAML frontmatter block. This
violates the rule disallowing any < or > characters anywhere within SKILL frontmatter.
Agent Prompt
## Issue description
The YAML frontmatter includes a `>` character via `description: >`, which is disallowed.

## Issue Context
This rule applies to any `<`/`>` characters anywhere in the SKILL frontmatter block, including YAML scalar indicators.

## Fix Focus Areas
- skills/dotnet-strong-name-signing/SKILL.md[1-5]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the repo’s skill guidance and validation around release-adjacent work, benchmark execution/contract expectations, and stricter git-workflow behaviors (emoji-first subjects, identity-safe commit execution, and mandatory changelog gates), with accompanying documentation and eval updates.

Changes:

  • Extend git workflow skills (commits + squash summary + keep-a-changelog) with stronger default behaviors, new eval contracts, and updated shared commit-language guidance.
  • Strengthen skill validation (validate-skill-templates.ps1) to assert the new required rules/sections and enforce alignment across skills.
  • Update skill-creator-agnostic benchmark documentation to cover eval directory discovery, parallel execution when supported, Windows/Python UTF-8 pitfalls, and CLI prompt passing.

Reviewed changes

Copilot reviewed 20 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
skills/skill-creator-agnostic/references/windows-powershell-benchmarking.md Adds Windows/Python UTF-8 and CLI prompt-handling benchmarking guidance plus new failure symptoms.
skills/skill-creator-agnostic/references/benchmark-contract.md Clarifies benchmark layout/discovery rules and adds parallelism guidance when runners support sub-agents.
skills/skill-creator-agnostic/evals/evals.json Adds eval coverage for parallel fan-out and Windows/Codex benchmarking pitfalls.
skills/skill-creator-agnostic/SKILL.md Updates the runner-agnostic workflow to prefer parallel measured runs, enforce eval-* naming, and document parity outcomes.
skills/markdown-illustrator/evals/evals.json Adds eval cases for whiteboard/blackboard styles using selective color accents.
skills/markdown-illustrator/SKILL.md Updates visual-treatment rules to bias toward restrained accent colors for board styles.
skills/git-visual-squash-summary/references/commit-language.md Updates shared commit-language guidance to make prefixes opt-in and refine emoji intent guidance.
skills/git-visual-squash-summary/evals/evals.json Adds evals for branch-scope defaults, no commit-picking UX, and release-communication emoji choice.
skills/git-visual-squash-summary/SKILL.md Clarifies whole-branch-by-default behavior and prefixless output shape for squash summaries.
skills/git-visual-commits/references/commit-language.md Mirrors the shared commit-language update (prefixes opt-in; refined emoji intent guidance).
skills/git-visual-commits/evals/evals.json Adds evals for release-adjacent intent splitting, identity/tool-path failure handling, and prefix opt-in behavior.
skills/git-visual-commits/SKILL.md Updates commit workflow rules for emoji-first default subjects, direct git execution for identity-sensitive commits, and conservative recovery/clarification gates.
skills/git-nuget-release-notes/SKILL.md Adds hero image reference to the skill docs.
skills/git-keep-a-changelog/evals/evals.json Adds evals enforcing the pending-worktree confirmation gate and compare-link footer repair behavior.
skills/git-keep-a-changelog/SKILL.md Makes pending-worktree confirmation a mandatory workflow gate and requires compare-link footer maintenance on edits.
skills/git-keep-a-changelog/FORMS.md Introduces structured-input + deterministic text fallback spec for the pending-worktree gate.
skills/dotnet-strong-name-signing/SKILL.md Adds hero image reference and normalizes some formatting in the doc.
scripts/validate-skill-templates.ps1 Strengthens validator assertions for the new git-skill rules and keep-a-changelog FORMS/gates.
README.md Updates repo-level skill summaries and guidance to match new defaults and mandatory checkpoints.
CHANGELOG.md Adds release notes for 0.3.3 and updates compare links.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


- Count staged, unstaged, and untracked changes separately.
- If there are no pending changes, continue normally.
- If there are pending changes and the target is a concrete release heading such as `## [1.2.3]`, You must ask a direct confirmation question before drafting the changelog entry.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Step 3 bullet list, "..., You must ask..." uses a capital "You" mid-sentence. This reads like a typo; it should be lowercase to keep the instructions grammatically correct.

Suggested change
- If there are pending changes and the target is a concrete release heading such as `## [1.2.3]`, You must ask a direct confirmation question before drafting the changelog entry.
- If there are pending changes and the target is a concrete release heading such as `## [1.2.3]`, you must ask a direct confirmation question before drafting the changelog entry.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +53
"id": 8,
"prompt": "The benchmark runner supports sub-agents and background jobs. Should the measured skill benchmark still run evals one-by-one, or should it fan out? Tell me the expected execution pattern.",
"expected_output": "The response prefers parallel paired executor runs and parallel grading when the runner supports sub-agents, while keeping the existing benchmark artifact contract intact.",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval IDs are no longer in ascending order (id 8 appears before ids 5–7). Consider keeping the eval list ordered by id to make diffs/reviews easier and avoid accidental id reuse during future edits.

Copilot uses AI. Check for mistakes.
@gimlichael gimlichael merged commit afba034 into main Mar 25, 2026
1 check passed
@gimlichael gimlichael deleted the v0.3.3/housekeeping branch March 25, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants