Nous processes user-authored documentation through a pipeline of parsers, transformers, and renderers. Each stage crosses a trust boundary where untrusted input must be validated before it can influence output.
User-authored source files (untrusted)
│
├── Parsers (parser-md, parser-mdx, parser-adoc, parser-rst, parser-kd, parser-kdx)
│ └── Validate: frontmatter schemas, annotation keys, container names
│ └── Output: NDM nodes (structured, typed)
│
├── Transformers (plugin-shiki, core pipeline)
│ └── highlightedHtml: produced by Shiki, sanitized before emission
│ └── Output: enriched NDM nodes
│
├── Renderer (renderer-html)
│ └── Entity escaping for all inline/attribute content
│ └── DOMPurify sanitization for htmlBlock and highlightedHtml
│ └── SafeHtml branded type prevents raw string emission
│ └── Output: sanitized HTML string
│
├── Metadata Emitters (agent-metadata)
│ └── MarkdownBuilder for all user-content interpolation
│ └── Context-specific escaping (link labels, URLs, code fence langs)
│ └── Output: Markdown, JSON-LD, agents.json, OpenAPI specs
│
└── Search Client (search)
└── href validation rejects protocol-relative URLs
└── escapeHtml for all rendered search results
└── Output: client-side search UI
| Boundary | Location | Mechanism |
|---|---|---|
| Source → NDM | Parser packages | Schema validation, annotation key allowlist, container name validation |
| NDM → HTML | renderer-html/render-nodes.ts |
escapeHtml() for text/attributes, DOMPurify.sanitize() for htmlBlock and highlightedHtml |
| NDM → Markdown | agent-metadata/emitters/ |
MarkdownBuilder with context-specific escaping |
| Search index → DOM | search/client.ts |
escapeHtml(), href validation, escapeRegExp() |
| Annotation keys → Object properties | parser-kd/preprocessor.ts |
Object.create(null) + VALID_ANNOTATION_KEY allowlist |
| Container names → Node types | parser-kd/preprocessor.ts |
VALID_CONTAINER_NAME regex (/^[a-zA-Z][a-zA-Z0-9-]{0,63}$/) |
Annotation keys are validated against an allowlist pattern (/^@?[a-zA-Z][a-zA-Z0-9_.-]{0,63}$/). Keys that don't match — including __proto__, constructor, and all other prototype properties — are silently dropped. Annotation objects use Object.create(null) to eliminate prototype chain access.
Container names are validated against /^[a-zA-Z][a-zA-Z0-9-]{0,63}$/. Invalid names are treated as plain text.
HTML output uses DOMPurify (isomorphic-dompurify), which employs the same DOM parser as browsers (via jsdom). This eliminates the class of mutation XSS (mXSS) attacks that exploit parser differentials between sanitizer and browser.
Two DOMPurify configurations are maintained:
DOMPURIFY_CONFIG: permissive allowlist for user-authoredhtmlBlockcontent (structural/semantic HTML elements)HIGHLIGHTED_HTML_CONFIG: restrictive allowlist for syntax-highlighted code (onlypre,code,span,divwithclass/style)
Pipeline-internal HTML (highlightedHtml from Shiki) passes through the restrictive sanitizer. Trust-by-convention has been eliminated.
Markdown output uses MarkdownBuilder, which separates structure from content at the type level. User-controlled strings pass through context-specific escaping functions (escapeLinkLabel, escapeLinkUrl, sanitizeLang, singleLine) that also strip Unicode bidirectional override characters.
The SafeHtml branded type (nominal typing via unique symbol) makes raw string injection a compile-time error. SafeHtml values can only be created through:
createSafeHtml(): wraps DOMPurify output (caller must guarantee sanitization)escapeToSafeHtml(): entity-escapes plain textconcatSafeHtml(): joins existingSafeHtmlvalues
The unwrapSafeHtml() function extracts the raw string for final emission. Call sites using this function are trust extraction points and should be reviewed with the same scrutiny as SQL query construction.
The renderer emits a CSP meta tag via getCSPMetaTag():
default-src 'self';
script-src 'self';
style-src 'self' 'unsafe-inline';
img-src 'self' data: https:;
object-src 'none';
base-uri 'self';
form-action 'self';
frame-ancestors 'none'
Key properties:
script-src 'self'blocks inline scripts even if sanitization is bypassedobject-src 'none'blocks plugin-based attacks entirelyframe-ancestors 'none'prevents clickjackingbase-uri 'self'prevents base tag hijacking
The CSP can be customized via getCSPDirectives(overrides) for sites that require additional script sources (analytics, etc.).
packages/renderer-html/src/security.test.ts contains adversarial tests covering:
- OWASP top XSS payloads (script, iframe, object, embed, form, meta)
- Event handler injection (onclick, onmouseover, onerror, onload, onfocus)
- javascript:/vbscript:/data: URI schemes with case obfuscation and whitespace padding
- Encoding attacks (mixed case, null bytes, HTML entity encoding, double encoding)
- Mutation XSS vectors (nested tag confusion, SVG foreignObject, math/mtext namespace, noscript, template, style exfiltration)
- highlightedHtml injection (script, event handlers, iframe, disallowed tags)
- Inline content entity escaping (text nodes, link URLs, image alt, inline code)
- CSP meta tag emission and override mechanics
- Component props serialization and attribute breakout prevention
Security issues should be reported via GitHub Security Advisories, not through public issues.