Skip to content

Faster codebase indexing for huge repos (>500k files)#11398

Open
In-line wants to merge 5 commits intoRooCodeInc:mainfrom
In-line:faster-codebase-indexing
Open

Faster codebase indexing for huge repos (>500k files)#11398
In-line wants to merge 5 commits intoRooCodeInc:mainfrom
In-line:faster-codebase-indexing

Conversation

@In-line
Copy link
Contributor

@In-line In-line commented Feb 11, 2026

  1. Moves codebase cache implementation to sqlite3 wasm DB
    a) This achieves better performance and avoids loading all of the paths in the JS heap (WASM heap is separate)
    b) We don't write all of the paths everytime (so no destroying user's SSD)
  2. Progress popover for codebase now displays total files remaining, this is especially useful for repos with huge amount of files.
  3. File processing is using queue and generator based implementation to avoid loading all of the objects into the JS heap at the same time.

Good candidate for test is built https://github.com/project-chip/connectedhomeip

Discord: .inline

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Enhancement New feature or request labels Feb 11, 2026
@@ -30,7 +30,10 @@ class WorkspaceTracker {
return
}
const tempCwd = this.cwd
const [files, _] = await listFiles(tempCwd, true, MAX_INITIAL_FILES)
const files: string[] = []
for await (const file of listFiles(tempCwd, true, MAX_INITIAL_FILES)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workspace tracker wasn't optimized currently

@roomote
Copy link
Contributor

roomote bot commented Feb 11, 2026

Rooviewer Clock   See task

The latest commit addressed 5 of 6 previous issues. 1 prior issue remains unresolved; no new issues found.

  • Deadlock on error in scanner: processTask does not call progressQueue.complete() on failure, causing the caller to hang indefinitely on for await (scanner.ts)
  • Missing ripgrep kill in execRipgrep: ripgrep process is not killed when the queue limit is reached, so it runs to completion (or until the 10s timeout) even after the consumer stops reading (list-files.ts)
  • Invalid PRAGMA journal_mode = WAL2: WAL2 does not exist in mainline SQLite/WASM; should be WAL (cache-manager.ts)
  • Missing await on fileQueue.enqueue(): backpressure is ineffective in the scanner discovery loop (scanner.ts)
  • NOT IN query exceeds SQLite variable limit on large repos: deleteHashesNotIn builds a single query with one placeholder per processed file; exceeds SQLITE_MAX_VARIABLE_NUMBER (32,766) well before the 50k file limit (cache-manager.ts)
  • pendingBatchCount decrement uses .then() instead of .finally(): if a batch promise rejects, the counter never decrements and the while (pendingBatchCount >= MAX_PENDING_BATCHES) loop hangs (scanner.ts)
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@In-line In-line force-pushed the faster-codebase-indexing branch from 51b3cf3 to 522eced Compare February 11, 2026 09:10
@In-line
Copy link
Contributor Author

In-line commented Feb 11, 2026

codepopover

@In-line In-line force-pushed the faster-codebase-indexing branch from 1e73dc7 to 615c3e5 Compare February 11, 2026 10:37
@In-line In-line force-pushed the faster-codebase-indexing branch from 615c3e5 to 11190c7 Compare February 11, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant