feat: update suggestion data schemas for some of the opportunities#1466
feat: update suggestion data schemas for some of the opportunities#1466Kanishkavijay39 wants to merge 1 commit intomainfrom
Conversation
…a11y, and internal-links - Add hasAltAttribute boolean to ALT_TEXT recommendation items - Make CWV url and issues fields optional (groups lack url, issues added via update flow) - Add patchContent and isCodeChangeAvailable to COLOR_CONTRAST and A11Y_ASSISTIVE schemas - Relax BROKEN_INTERNAL_LINKS URL validation to accept malformed URLs as-is Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iuliag
left a comment
There was a problem hiding this comment.
Hi @Kanishkavijay39,
Strengths
- Exactly two files changed, tests updated in lockstep. Well-scoped.
- BROKEN_INTERNAL_LINKS URL relaxation includes a comment explaining the trade-off - good practice for schema evolution.
- All changes are backward-compatible: existing valid data still passes all schemas.
.unknown(true)on all schemas allows forward-compatible additions across rolling deployments.- CWV
urlfield correctly retains.uri()format validation when present - only optionality changed. - Updated test descriptions capture the rationale ("issues is optional for update flow"), which helps future readers.
Issues
Important (Should Fix)
BROKEN_INTERNAL_LINKS: bare Joi.string() accepts dangerous URI schemes (suggestion.data-schemas.js:382-388)
url_from, urlFrom, url_to, urlTo, and urlsSuggested now accept any string, including javascript:, data:, blob:, and empty strings. These values originate from external crawled content. urlsSuggested is rendered to users as suggested replacement URLs - React auto-escapes text nodes but does NOT sanitize javascript: URIs in href attributes. If the UI renders these as clickable links (the natural UX), javascript:alert(document.cookie) executes in the user's session. Additionally, url_from values are passed to HTTP fetch calls downstream; empty strings and non-URL values will produce unhandled fetch errors.
The crawl pipeline legitimately produces malformed HTTP/HTTPS URLs - the relaxation is valid. But the fix is too broad. Replace Joi.string() with a custom validator that accepts malformed http/https/relative URLs while rejecting dangerous schemes:
const relaxedUrl = Joi.string().min(1).custom((value, helpers) =>
/^(https?:\/\/|\/)/i.test(value) ? value : helpers.error('string.uriScheme'), 'relaxed URL'
);CWV schema is silently polymorphic with no documented contract (suggestion.data-schemas.js:133, 151)
Making both url and issues optional creates two implicit data shapes - page-level (url and issues present) and group-type (url absent, issues populated later via update) - with no discriminator. The minimal projection still lists both fields, so consumers calling getProjection('cwv', 'minimal') receive records with url: undefined. Downstream, the suggestionsByUrl map built from these values collapses multiple group suggestions to the undefined key.
Either add a Joi.when('type', { is: 'url', then: Joi.required() }) conditional to preserve the page-level invariant, or add a code comment documenting the two CWV shapes and confirming that all consumers handle missing url and issues gracefully.
Missing test coverage for all new fields
patchContent,isCodeChangeAvailableon COLOR_CONTRAST (suggestion.data-schemas.js:88) and A11Y_ASSISTIVE (suggestion.data-schemas.js:118): no tests.hasAltAttributeon ALT_TEXT (suggestion.data-schemas.js:170): no test.- BROKEN_INTERNAL_LINKS URL relaxation: no test proving previously-invalid malformed URLs now pass. This is the highest-risk change in the PR and the most likely to be silently re-tightened by a future commit.
Add at least one positive and one negative test per new field, plus a test confirming a malformed URL passes for BROKEN_INTERNAL_LINKS.
CWV issues optional without default or consumer audit (suggestion.data-schemas.js:151)
issues was previously required; consumers built against that guarantee don't guard against undefined. Code doing suggestion.data.issues.length will throw at runtime. Add .default([]) so consumers always receive an array, or audit all consumers and document that they handle undefined.
Minor (Nice to Have)
suggestion.model.test.js:262, 272: test names say "url is optional for groups" and "issues is optional for update flow," but the schema change is unconditional. Rename to "passes when url is missing (url is optional)" and "passes when issues is missing (issues is optional)" to avoid implying a conditional constraint.patchContentandisCodeChangeAvailableare duplicated identically in COLOR_CONTRAST and A11Y_ASSISTIVE. As more audit types gain fix capabilities, consider extracting a sharedfixMetadataJoi fragment to avoid copy-paste drift.- BROKEN_INTERNAL_LINKS and BROKEN_BACKLINKS are structurally near-identical schemas but now diverge on URL validation. Add a comment on BROKEN_INTERNAL_LINKS explaining the divergence from BROKEN_BACKLINKS so readers don't assume they behave identically.
- No changelog or comment on the CWV required-to-optional contract change. Worth a note for downstream teams.
Recommendations
- Deployment order: this PR must be published to npm and the dependency version bumped in audit-worker before audit-worker#2190 deploys. Document the deployment sequence explicitly.
- Consider a shared
relaxedUrlJoi validator in the schema utilities. Multiple opportunity types may need "accepts malformed HTTP/HTTPS URLs but rejects dangerous schemes" - centralizing it prevents future divergence. - Add a Coralogix query or panel tracking validation warnings by opportunity type after both PRs deploy. That signal determines when Phase 1 is complete.
Assessment
Ready to merge? No
Reasoning: The BROKEN_INTERNAL_LINKS change removes all format validation from fields carrying untrusted external input that is displayed to users and used in downstream fetches. The current rendering path happens to be safe due to React text-node auto-escaping, but there is no defense-in-depth. The CWV contract change also needs either conditional validation or explicit consumer confirmation. Both are straightforward fixes.
Next Steps
- Replace
Joi.string()with a scheme-restricted custom validator on all BROKEN_INTERNAL_LINKS URL fields. - Add
Joi.when()conditional on CWVurl, or document and confirm consumer handling explicitly. - Add
.default([])to CWVissues. - Add tests for all new fields and the BROKEN_INTERNAL_LINKS relaxation.
- Rename the two CWV test descriptions.
This PR updates Joi validation schemas for 5 opportunity types in suggestion.data-schemas.js. All changes loosen or extend existing schemas no breaking changes for existing valid data.
Please ensure your pull request adheres to the following guidelines:
Related Issues
Thanks for contributing!