Skip to content

Add @nitpicker/query and @nitpicker/mcp-server packages#56

Merged
YusukeHirao merged 12 commits intomainfrom
claude/implement-archive-mcp-server-8XLGZ
Mar 13, 2026
Merged

Add @nitpicker/query and @nitpicker/mcp-server packages#56
YusukeHirao merged 12 commits intomainfrom
claude/implement-archive-mcp-server-8XLGZ

Conversation

@YusukeHirao
Copy link
Copy Markdown
Member

Summary

This PR introduces two new packages to enable querying .nitpicker archive files:

  1. @nitpicker/query - A query API library providing SQL-level filtering, aggregation, and pagination for archive data
  2. @nitpicker/mcp-server - A Model Context Protocol (MCP) server exposing archive queries to AI assistants like Claude

Key Changes

@nitpicker/query Package

  • ArchiveManager: Manages lifecycle of opened archives with reference counting to prevent resource exhaustion (max 20 concurrent archives)
  • Query Functions: 14 specialized query functions for archive analysis:
    • getSummary() - Site-wide statistics (page counts, status distribution, metadata fulfillment rates)
    • listPages() - Pages with rich filtering (status codes, missing metadata, URL patterns, directory paths)
    • getPageDetail() - Detailed page information including links and redirects
    • listLinks() - Link analysis (broken, external, orphaned pages)
    • listImages() - Image inventory with quality issue detection (missing alt, dimensions, lazy-loading)
    • listResources() - Sub-resource tracking (CSS, JS, fonts, etc.)
    • checkHeaders() - Security header validation (CSP, X-Frame-Options, HSTS, etc.)
    • findDuplicates() - Metadata duplication detection
    • findMismatches() - Canonical/OG tag mismatches
    • getViolations() - Accessibility and validation violations
    • getPageHtml() - HTML snapshot retrieval
    • getResourceReferrers() - Resource usage tracking
  • Type Definitions: Comprehensive TypeScript interfaces for all query options and results
  • SQL Optimization: All queries use Knex.js with database-level filtering for performance on large datasets (10,000+ pages)

@nitpicker/mcp-server Package

  • MCP Server Implementation: Stdio-based MCP server exposing all query functions as tools
  • Tool Definitions: 14 MCP tools with JSON Schema input validation and LLM-friendly descriptions
  • Argument Validation: Type-safe extraction and validation of tool arguments with helpful error messages
  • Enum Validation: Validates link types, mismatch types, and duplicate fields
  • Comprehensive Tests: Full test suite covering all tools and edge cases

Supporting Changes

  • Updated ARCHITECTURE.md to document new packages
  • Updated README.md with MCP Server setup instructions
  • Added CLAUDE.md documentation for AI context
  • Extended @nitpicker/crawler types with DB_Image interface for image tracking
  • Added security header keyword to cspell.json

Implementation Details

  • Reference Counting: Archives opened multiple times reuse the same extraction, preventing redundant untarring
  • Resource Limits: Maximum 20 concurrent open archives to prevent file descriptor exhaustion
  • SQL-Level Filtering: All filtering, sorting, and pagination happens at the database level for efficiency
  • Temporary Directory Management: Automatic cleanup of extracted archives when all references are closed
  • MCP Protocol: Uses @modelcontextprotocol/sdk for standards-compliant AI assistant integration

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc

claude added 12 commits March 11, 2026 08:59
Add two new packages for querying .nitpicker archive files via MCP:

- @nitpicker/query: Archive lifecycle management and 12 query functions
  (getSummary, listPages, getPageDetail, getPageHtml, listLinks,
  listResources, listImages, getViolations, findDuplicates,
  findMismatches, getResourceReferrers, checkHeaders)
- @nitpicker/mcp-server: MCP server exposing 14 tools via stdio transport
  (open_archive, close_archive + 12 query tools)

Crawler changes:
- Add getKnex() to ArchiveAccessor and Database for SQL-level queries
- Add DB_Image type definition to archive types

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
Add explicit return type to CallToolRequestSchema handler to avoid
deep type instantiation, and fix count query type access.

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
…, update docs

- Add unit tests for 7 query functions (get-page-detail, get-page-html, list-links,
  list-resources, list-images, find-mismatches, get-resource-referrers)
- Add 16 integration tests for mcp-server covering all 14 tools, error handling,
  and lifecycle management
- Replace unsafe type casts with requireString/optionalNumber validation helpers
- Replace destructuring patterns causing unused variable lint errors with omit helper
- Update ARCHITECTURE.md, CLAUDE.md, and README.md with query and mcp-server packages

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
Source code fixes:
- getViolations: rewrite to read analysis/violations file instead of N+1
  per-page queries; replace empty catch blocks with proper error handling
- ArchiveManager: fix resource leak in close() by calling archive.close()
  to destroy DB connection and clean up tmpDir
- mcp-server: add NaN check to optionalNumber(); add runtime enum
  validation for link type, mismatch type, and duplicate field; replace
  raw Knex query in open_archive with getSummary()
- check-headers/get-page-detail: replace silent catch blocks with
  console.warn for JSON parse errors
- All query functions: replace non-null assertion [0]! with optional
  chaining [0]?.total ?? 0 with explanatory comments

Test improvements:
- Add get-violations.spec.ts (9 tests): filtering, pagination, ENOENT
- Add archive-manager.spec.ts (9 tests): lifecycle, cleanup, error cases
- mcp-server.spec.ts: strengthen assertions (toBe instead of
  toBeGreaterThanOrEqual), add SDK internal API comment, add enum
  validation error tests
- list-pages.spec.ts: add 6 missing filter tests (statusMin, statusMax,
  missingDescription, urlPattern, sortBy/sortOrder, directory)
- list-links.spec.ts: replace weak assertions with precise toMatchObject
  checks, add pagination test

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
- list-images.ts: extract oversizedThreshold to local variable to avoid non-null assertion
- list-links.spec.ts: use direct items[0] with toMatchObject instead of .find()
- archive-manager.spec.ts: use hardcoded expected IDs instead of computed values

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
- ArchiveManager: add .nitpicker extension validation to reject
  arbitrary file types
- ArchiveManager: add MAX_OPEN_ARCHIVES (20) limit to prevent
  resource exhaustion via unlimited archive opens
- ArchiveManager: log warning on close failure instead of silently
  swallowing errors
- getViolations: use error.code === 'ENOENT' instead of fragile
  string matching on error.message
- mcp-server: sanitize error messages to avoid leaking internal
  file paths (/tmp, /home, /root, /usr)
- Add tests for extension validation and concurrent archive limit

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
- Add file existence check (accessSync) before opening archives
- Resolve symlinks (realpathSync) and re-validate extension to prevent
  symlink-based path traversal attacks
- Simplify error message sanitization to strip all multi-segment absolute paths
- Add tests for missing file and symlink traversal scenarios

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
ArchiveManager now deduplicates open calls by resolved real path.
When the same .nitpicker file is opened again, the existing extraction
and DB connection are shared via reference counting — no redundant
untar is performed. Resources are released only when all references
to the same file are closed.

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
- Add explicit expect(archive).toBeDefined() before non-null assertions
- Fix symlink test: use rmSync before symlinkSync instead of try/catch,
  wrap assertion in try/finally for reliable cleanup
- Rename test to match actual verification: "同じファイルの再オープンは
  ユニークファイル数の上限にカウントされない"
- Fix closeAll race condition: use sequential loop instead of
  Promise.all to prevent concurrent close on same shared entry
- Add tmpDir existence check in ref-count partial-close test

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
- ARCHITECTURE.md: add reference counting / dedup description for ArchiveManager
- archive-manager.ts: add missing @throws for file-not-found, clarify
  @returns includes archive on first open only
- check-headers.ts: fix @param options description to include filter

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
zod was incorrectly listed as a direct dependency of @nitpicker/mcp-server
in the lockfile but is not in package.json. This caused yarn install --immutable
to fail in CI.

https://claude.ai/code/session_01XmSXeM4Jx8rzxwzu6GSvGc
@YusukeHirao YusukeHirao merged commit 59e2282 into main Mar 13, 2026
3 checks passed
@YusukeHirao YusukeHirao deleted the claude/implement-archive-mcp-server-8XLGZ branch March 13, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: .nitpicker アーカイブ照会 MCP サーバーの新規実装

2 participants