Skip to content

perf: Optimize RST parsing with pattern and instance caching#1288

Open
CybotTM wants to merge 2 commits intophpDocumentor:mainfrom
CybotTM:perf/rst-parsing-optimizations
Open

perf: Optimize RST parsing with pattern and instance caching#1288
CybotTM wants to merge 2 commits intophpDocumentor:mainfrom
CybotTM:perf/rst-parsing-optimizations

Conversation

@CybotTM
Copy link
Contributor

@CybotTM CybotTM commented Jan 21, 2026

Summary

Optimizes RST parsing with instance reuse and O(1) hash set lookups for hyperlink validation.

Changes

  • ExternalReferenceResolver: Add SUPPORTED_SCHEMAS_LIST and isSupportedScheme() for O(1) hash set lookup
  • InlineParser: Reuse InlineLexer instance instead of creating new one per parse
  • InlineLexer: Use ExternalReferenceResolver::isSupportedScheme() for URI scheme validation (~6x faster)
  • LineChecker: Cache compiled regex patterns
  • Buffer: Cache unindent calculations

Performance Impact

See Performance Analysis Report for detailed benchmarks.

The hash set optimization for URI schemes provides approximately 6x speedup compared to the previous regex-based approach for the 371 IANA-registered schemes.

Merge Note

Both this PR and #1287 add the same isSupportedScheme() method to ExternalReferenceResolver. When the second PR merges, the conflict is trivially resolved by keeping the existing code.


Related PRs

PR Description Status
#1287 Rendering caching layer Independent (trivial merge conflict on ExternalReferenceResolver)
#1288 This PR - RST parsing optimizations
#1289 CLI container caching Independent
#1291 Symfony 8 compatibility ✅ Merged
#1293 ProjectNode O(1) document lookup Independent

All PRs can be merged independently in any order.

@CybotTM CybotTM force-pushed the perf/rst-parsing-optimizations branch from edb847a to 6d2e211 Compare January 22, 2026 00:26
@CybotTM CybotTM changed the title perf: Optimize RST parsing with regex and instance caching perf: Optimize RST parsing with pattern and instance caching Jan 22, 2026
@CybotTM CybotTM force-pushed the perf/rst-parsing-optimizations branch 2 times, most recently from bcc53c1 to b642af7 Compare January 22, 2026 01:41
@CybotTM CybotTM force-pushed the perf/rst-parsing-optimizations branch from b642af7 to 6a14fda Compare January 23, 2026 13:00
@CybotTM CybotTM marked this pull request as ready for review January 23, 2026 13:06
CybotTM added 2 commits March 9, 2026 01:21
Add caching optimizations for hot paths in RST parsing:

- InlineParser: reuse single InlineLexer instance instead of creating
  new one per parse call (lexer state fully reset via setInput())
- InlineLexer: cache expensive hyperlink pattern built from
  SUPPORTED_SCHEMAS (5600+ chars) as static variable
- LineChecker: add static caches for isDirective(), isLink(), and
  isAnnotation() regex results with proper cache key handling
- Buffer: ensure unindented flag is reset in all mutators (set, pop,
  clear) for consistent cache invalidation
- CachableInlineRule: simplify type annotations

Note: Lexer reuse assumes single-threaded parsing. Concurrent parsing
would require separate instances.

See https://cybottm.github.io/render-guides/ for benchmark data.
Add SUPPORTED_SCHEMAS_LIST and isSupportedScheme() to ExternalReferenceResolver
for O(1) hash set lookup instead of regex matching against 371 IANA schemes.
This is ~6x faster than the 5600+ character regex pattern.

InlineLexer now uses ExternalReferenceResolver::isSupportedScheme() to
validate URI schemes during tokenization.

Note: This change is also in PR phpDocumentor#1287 - when both PRs merge, the conflict
is trivially resolved by keeping one version.
@CybotTM CybotTM force-pushed the perf/rst-parsing-optimizations branch from 6a14fda to a7f8348 Compare March 9, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant