ci: gate docstring quality and coverage in CI (#616) by planetf1 · Pull Request #689 · generative-computing/mellea

planetf1 · 2026-03-18T14:27:36Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes ci: gate doc quality in CI and pre-commit to prevent future degradation #616

Adds a hard-fail docstring quality gate to the docs-publish workflow (--quality --fail-on-quality --threshold 100). Both checks currently pass in CI (100% coverage, 0 quality issues).

Also adds a typeddict_mismatch scanner to audit_coverage.py — flags Attributes: sections on TypedDict classes that document phantom fields or omit declared ones (mirrors the existing param_mismatch logic for functions).

Pre-commit hook updated to use --fail-on-quality; stays stages: [manual] since it requires pre-built docs. CI is the hard gate.

Contribution docs updated with TypedDict docstring requirements and the two new check kinds.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

github-actions · 2026-03-18T14:27:58Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1 · 2026-03-18T14:33:29Z

@psschwei @HendrikStrobelt @jakelorocco We are finally at a state where the docs build is clean, and reporting no issues. I recommend we now make this a hard-fail.

We're had at least one PR merged which regressed some docstrings (albeit minor), and another pr in the queue with quite a few changes needed. By ensuring we validate as part of CI we'll maintain the docs in a clean state

Note this does also add a slightly looser pre-commit. That is optional if we think too annoying

psschwei

I'm in favor of adding, but one question: is the output of our docstring checker the kind of thing that could be easily fed into an agent to produce the appropriate docstrings / make the check pass? (thinking of (a) keeping contribution process easy for new folks, and (b) how we can use agents at some point to do this for us)

mergify · 2026-03-18T14:42:34Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

planetf1 · 2026-03-18T15:26:58Z

Yes, much of the docs have been written and restructured by AI - long term the ability to generate too is on the cards. I think there will be some incremental improvements to make scan results easier...

psschwei

I am in favor, but would be good for @jakelorocco to weigh in too

planetf1 · 2026-03-18T16:25:53Z

A good time to consider forcing

Add a hard-fail docstring quality gate to the docs-publish workflow: - New 'Docstring quality gate' step runs --quality --fail-on-quality --threshold 100; fails if any quality issue is found or coverage drops below 100% (both currently pass in CI) - Existing audit_coverage step (soft-fail, threshold 80) retained for the summary coverage metric Add typeddict_mismatch checks to audit_coverage.py: - typeddict_phantom: Attributes: documents a field not declared in the TypedDict - typeddict_undocumented: declared field absent from Attributes: section - Mirrors the existing param_mismatch logic for functions Pre-commit: enable --fail-on-quality on the manual-stage hook (CI is the hard gate; hook remains stages: [manual] as docs must be pre-built). Update CONTRIBUTING.md and docs/docs/guide/CONTRIBUTING.md with TypedDict docstring requirements and the two new audit check kinds.

planetf1 · 2026-03-18T22:43:03Z

An intrinsics updated has introduced some documentation errors - not fatal: missing args, and missing returns (10 omissions) -- but having seen 2 other PRs potentially doing this too, this is why we MUST start enforcement, not just the checks, which were the facilitator of improving the docs.

@jakelorocco @HendrikStrobelt What do you think? Are you ok with this in principle?

If so I will add the fix-ups into this PR tomorrow and get it approved/merged ASAP.

Any concerns?

planetf1 · 2026-03-18T22:44:09Z

vv This was the regression

… and fix hints audit_coverage.py: - Add file/line fields to every issue dict (repo-relative path + def line) - _print_quality_report now shows [file:line] per issue, per-kind Fix:/Ref: hints linking to CONTRIBUTING.md anchors, and emits ::error file=...,line=... GHA annotations so issues appear inline in PR diffs - Cap GHA annotations at 10 per check kind with "N more in job log" notice - Add _KIND_FIX_HINTS and _gha_file_annotation helpers; _CONTRIB_DOCS_URL constant validate.py: - Convert all check functions from list[str] to list[dict] errors (file/line/message) - Add line-number tracking to validate_source_links, validate_internal_links, validate_anchor_collisions, and validate_doc_imports - Emit per-error GHA annotations with file/line; shared 20-annotation budget across all checks so every category gets representation in PR diff - Fix icon bug: summary rows now use correct pass/fail icon - Group detailed errors by check type with section headers docs-publish.yml: - Add --orphans and --output /tmp/quality_report.json to quality gate step - Upload quality_report.json as docstring-quality-report artifact (30-day retention) pyproject.toml: - cli/**/*.py: suppress only D2/D3/D4xx style rules; enable D1xx (missing docstrings) as a ruff-level complement to the audit_coverage quality gate docs/docs/guide/CONTRIBUTING.md: - Add CI docstring checks reference section with per-kind tables (fix instructions + anchors) for all 11 check kinds across 4 categories - Add callout explaining GHA annotation cap (10 per kind) and where to find the full list (job log + JSON artifact)

…y links audit_coverage.py (audit_nav_orphans): - Probe docs/docs/docs.json before docs/mint.json so both Mintlify v1 and v2 nav configs are supported - Extend _extract to handle plain string page entries used by docs.json (v2 uses "pages": ["api/..."] strings; v1 used {"page": "api/..."} dicts) - Previously mint.json was never found, nav_refs stayed empty, and every MDX file was reported as an orphan docs-publish.yml (Write job summary): - When the quality gate fails, render a prominent markdown callout with a direct link to the CI docstring checks reference section in CONTRIBUTING.md - Add a per-kind fix reference table with clickable anchor links to each category section (missing/short, args/returns, class Option C, TypedDict) - Per-kind Ref: URLs in the raw log are inside a text block and do not render as links in the step summary; this table surfaces them rendered

…ped notice docs-publish.yml: - Parse per-kind counts from _print_quality_report section headers in the quality gate log (e.g. "Missing docstrings (12)") and show them as a comma-separated breakdown in the Docstring Quality table cell instead of just the total — gives developers an immediate view of which categories are failing without expanding the log audit_coverage.py: - Remove the "skipped (pass --quality to enable)" GHA notice emitted by the coverage-only step; there is always a dedicated quality gate step immediately after so the notice was misleading and redundant

audit_coverage.py: - Coverage miss section now shows a structured Fix:/Ref: block with the exact generate-ast.py command and a link to CONTRIBUTING.md#validating-docstrings - Missing symbols listed one per line (symbol indented under module) for scannability instead of comma-joined on one long line - Emit a ::error or ::warning GHA annotation with symbol/module counts when coverage symbols are undocumented validate.py: - Add _CHECK_FIX_HINTS dict mapping each check label to a (fix text, ref URL) pair, covering all 8 check types with specific fix instructions and links into CONTRIBUTING.md (root or guide as appropriate) - _print_check_errors now prints Fix:/Ref: under each section header, matching the pattern established by _print_quality_report

audit_coverage.py: - Add missing_param_type check: fires when Args: section exists but one or more concrete params lack Python type annotations; naturally non-overlapping with no_args (which fires when section is absent) - Add missing_return_type check: fires when Returns: section is documented but the function has no return annotation; naturally non-overlapping with no_returns (annotation exists but section absent) - Add fix hints and CONTRIBUTING.md anchors for both new check kinds - Update kind_labels and iteration order in _print_quality_report generate-ast.py: - Add remove_internal_modules() post-generation filter step - Uses AST-based import analysis: a submodule is internal when the parent __init__.py imports from at least one sibling submodule but not from this one (import-based visibility, not __all__ name-matching) - Conservative: keeps module when parent imports nothing (indeterminate) or __init__.py cannot be parsed - _CONFIRMED_INTERNAL_MODULES hardcoded set for known internals where parent imports nothing (json_util, backend_instrumentation); these should eventually be renamed with _ prefix per Python convention - Package index files (stem == parent dir) are never filtered docs.json: nav regenerated by build; internal modules removed from nav CONTRIBUTING.md: add missing_param_type / missing_return_type to CI docstring checks reference table docs-publish.yml: add both new kinds to summary kind_short and kind_anchors fix-reference table

audit_coverage.py was walking all non-_-prefixed source modules via Griffe, including internal modules (json_util, backend_instrumentation, etc.) whose MDX files were removed by remove_internal_modules() in generate-ast.py. This caused coverage to drop because those symbols were no longer 'documented' but were still counted in the denominator. Apply the same import-based public-API filter in discover_public_symbols(): skip submodules that the parent __init__.py does not import from, mirroring the generate-ast.py logic. _CONFIRMED_INTERNAL_MODULES kept in sync. Also drop the per-kind anchor table from the GHA job summary. Anchors in GitHub Actions summaries only navigate to the top of the referenced document, so the table added noise without working links.

jakelorocco

I think this is my bad for not understanding this earlier, but it looks like we are checking the docstring quality only in terms of our generated docs. I wonder if this is over-complicating things and we should instead be checking docstring quality by looking at our code. Is there an advantage to only checking the quality of things in our generated docs? If we switched to ensuring quality at the code level, could we rely on third party tools / packages to do the checks?

jakelorocco · 2026-03-19T12:49:46Z

docs/docs/guide/CONTRIBUTING.md

+| `missing_param_type` | `Args:` section exists but one or more parameters have no Python type annotation — the type column is absent from the generated API docs | Add a type annotation to each listed parameter in the function signature (e.g. `def f(x: int)`). Only fires when `no_args` is already satisfied; `*args`/`**kwargs` are excluded. |
+| `missing_return_type` | `Returns:` section is documented but the function has no return type annotation — the return type is absent from the generated API docs | Add a return annotation to the function signature (e.g. `-> str`). Only fires when `no_returns` is already satisfied. |
+
+#### Class docstrings (Option C)


I think Option C should be removed here? It looks like this was originally a list of fix options but we now accept all of them?

that section has gone - just a single reference to the documentation

jakelorocco · 2026-03-19T13:00:28Z

.pre-commit-config.yaml

+      # Docstring quality gate — manual only (CI is the hard gate via docs-publish.yml).
+      # Run locally with: pre-commit run docs-docstring-quality --hook-stage manual
+      # Requires generated API docs (run `uv run python tooling/docs-autogen/build.py` first).
      - id: docs-docstring-quality
-        name: Audit docstring quality (informational)
-        entry: bash -c 'test -d docs/docs/api && uv run --no-sync python tooling/docs-autogen/audit_coverage.py --quality --docs-dir docs/docs/api || true'
+        name: Audit docstring quality
+        entry: uv run --no-sync python tooling/docs-autogen/audit_coverage.py --quality --fail-on-quality --threshold 0 --docs-dir docs/docs/api
        language: system
        pass_filenames: false
        files: (mellea/.*\.py$|cli/.*\.py$)


I think this is counter-intuitive. If my local pre-commit checks pass, I would expect the remote ones with the same name to pass. Is it possible to always run these locally but not gate committing on them locally? Something like:

Validate generated MDX docs..............................................Ignore Issues: <Issues Here>

Ok - fair. pre-commit checks can be a concern. We could remove them.. but then it pushes finding issues to later in the process.

I can change to only warn (and still return success) - but a user could do this with precommit in any case? I'll do that for now..

jakelorocco · 2026-03-19T13:02:55Z

tooling/docs-autogen/validate.py

+            f"  (GHA annotations exhausted — {capped} more error(s) in job log "
+            f"and JSON artifact; total cap is {_GHA_ANNOTATION_CAP})"


Is it possible to include a link to the json artifact or an explanation of where to find it?

Adds two new quality check kinds that fire when the type explicitly stated in an Args:/Returns: docstring entry disagrees with the Python type annotation in the function signature: param_type_mismatch — 'param (OldType): ...' vs annotation 'NewType' return_type_mismatch — 'Returns: OldType: ...' vs annotation '-> NewType' Both checks fire only when BOTH sides have an explicit type; one-sided absence is already handled by missing_param_type / missing_return_type. Type comparison uses _types_match() / _normalize_type() which handles: - typing aliases: List→list, Dict→dict, Optional→X|None, Union→A|B - typing. prefix stripping - pipe-union component ordering (str|None == None|str) - incidental whitespace Known conservative suppressions (prefer false negatives over false positives, since there is no per-site suppression mechanism): - Nested generics not fully expandable by regex (e.g. Optional[list[str]]) are silently skipped — both sides must fully normalise to be compared - Union with bracket-containing members - Callable argument ordering

…ubmodule For a module file mellea/pkg/submodule.py, Griffe gives filepath ending in .py (not __init__.py). The parent __init__.py is fp.parent/__init__.py. The previous code used fp.parent.parent which is correct for packages (whose filepath IS the __init__.py) but goes one level too far for plain module files — it was checking the grandparent init instead of the parent. Effect: genslot, react, unit_test_eval and similar non-exported modules in stdlib/components were incorrectly counted as public symbols, inflating the denominator and lowering the reported coverage percentage.

planetf1 · 2026-03-19T14:15:58Z

Improved checks - clarification

Looking at other comments:

…#705)

planetf1 · 2026-03-19T14:33:40Z

I think this is my bad for not understanding this earlier, but it looks like we are checking the docstring quality only in terms of our generated docs. I wonder if this is over-complicating things and we should instead be checking docstring quality by looking at our code. Is there an advantage to only checking the quality of things in our generated docs? If we switched to ensuring quality at the code level, could we rely on third party tools / packages to do the checks?

It's a journey.. the deliverable was docs so my original plan was to minimize code changes -- at least for the first pass - hence the validation focusses on code

But.. as a strategy, yes I agree, it is more of a code-based check (the current tool builds the AST). There may be options to use other tools/plugins to achieve similar checks in a more structured way -- but this first version was a pragmatic way to get started and improve quality. Ideal to then reflect/review on how we can do it better in future.

Also related - #456 : our current mypy configuration isn't checking for missing types -- yet that is confusing for external classes. ruff checks some things, but not everything.

Ruff can do the following -

Our check	Ruff equivalent	Notes
`missing` / `short`	D1xx	stable
`no_returns`	DOC201	preview
`no_yields`	DOC402	preview
`no_raises`	DOC501	preview
`no_args`	—	not in ruff DOC
`param_mismatch`	—	not in ruff DOC
`param_type_mismatch`	—	not in ruff DOC
`return_type_mismatch`	—	not in ruff DOC
TypedDict checks	—	not in any tool
Coverage % / public API filter	—	not in any tool

but what it doesn't easily do is address Public vs Private (the current parser looks at what is exported) - so we'd need to move to cleanup all code including internal, or we'd have to restructure and/or be very careful in creating an exclusion list. I think this would best be tracked in a new issue? What we have focusses on external

planetf1 requested review from HendrikStrobelt, jakelorocco and psschwei March 18, 2026 14:31

planetf1 marked this pull request as ready for review March 18, 2026 14:33

planetf1 requested a review from a team as a code owner March 18, 2026 14:33

psschwei reviewed Mar 18, 2026

View reviewed changes

planetf1 force-pushed the ci/doc-quality-gate-616 branch from 364fb81 to 71b4bde Compare March 18, 2026 15:38

psschwei approved these changes Mar 18, 2026

View reviewed changes

planetf1 force-pushed the ci/doc-quality-gate-616 branch from 19261e3 to 58a7f44 Compare March 18, 2026 16:26

planetf1 added 3 commits March 18, 2026 22:36

style: fix ruff formatting in audit_coverage.py

f1aa435

ci: remove soft-fail mode labels from docs workflow summary

dfb95c4

planetf1 force-pushed the ci/doc-quality-gate-616 branch from 58a7f44 to dfb95c4 Compare March 18, 2026 22:37

planetf1 added 6 commits March 19, 2026 10:04

jakelorocco reviewed Mar 19, 2026

View reviewed changes

planetf1 added 3 commits March 19, 2026 13:14

docs: document param_type_mismatch and return_type_mismatch check kinds

83aed3b

style: fix ruff formatting and EN dash in validate.py

d739dd3

planetf1 mentioned this pull request Mar 19, 2026

docs: remove CLI internals from generated API docs and add proper CLI reference #705

Closed

planetf1 added 3 commits March 19, 2026 14:27

revert: restore full D suppression for cli/ (see generative-computing…

3f8d923

…#705)

docs: add artifact download link to quality gate failure summary

5123f49

feat: add --warn-only to validate.py for pre-commit informational mode

48b32b4

		f" (GHA annotations exhausted — {capped} more error(s) in job log "
		f"and JSON artifact; total cap is {_GHA_ANNOTATION_CAP})"

Conversation

planetf1 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

planetf1 commented Mar 18, 2026

Uh oh!

psschwei left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 18, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

planetf1 commented Mar 18, 2026

Uh oh!

psschwei left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 commented Mar 18, 2026

Uh oh!

planetf1 commented Mar 18, 2026

Uh oh!

planetf1 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

jakelorocco Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 commented Mar 19, 2026

Uh oh!

planetf1 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented Mar 18, 2026 •

edited

Loading

planetf1 commented Mar 18, 2026 •

edited

Loading

planetf1 commented Mar 19, 2026 •

edited

Loading