Skip to content

Add URL validation and improve scope flag parsing#50

Merged
YusukeHirao merged 5 commits intomainfrom
claude/improve-crawl-validation-7YxEU
Mar 9, 2026
Merged

Add URL validation and improve scope flag parsing#50
YusukeHirao merged 5 commits intomainfrom
claude/improve-crawl-validation-7YxEU

Conversation

@YusukeHirao
Copy link
Copy Markdown
Member

@YusukeHirao YusukeHirao commented Mar 7, 2026

Summary

This PR adds comprehensive URL validation to the crawl command and improves the handling of the --scope flag by filtering out empty values.

closes: #17

Key Changes

  • URL Validation: Added a new validateUrls() function that validates all URLs using the URL constructor before processing. This ensures:

    • Empty list files are rejected with a clear error message
    • Invalid URLs in --list-file, --list, and positional arguments are caught early with helpful error messages
    • Validation occurs at the entry point of the crawl command
  • Scope Flag Parsing: Enhanced mapFlagsToCrawlConfig() to filter out empty strings and whitespace-only values when parsing comma-separated scope values, preventing invalid empty scope entries

  • Documentation Updates:

    • Added URL format requirements to README.md explaining that full URLs with protocol (e.g., https://example.com) are required
    • Updated ARCHITECTURE.md to clarify the snapshot compression process and correct terminology from titleOnly to metadataOnly
    • Added JSDoc comment to the debug logger export
  • Test Coverage: Added 6 new test cases covering:

    • Empty list file error handling
    • Invalid URLs in list files, list flags, and arguments
    • Scope flag filtering of empty and whitespace-only values

Implementation Details

The validateUrls() function attempts to construct a URL object for each provided URL string, catching any parsing errors and throwing a descriptive error message. This validation is called at three key points in the crawl command flow: when using --list-file, when combining --list with arguments, and when using a single positional argument.

https://claude.ai/code/session_01JZboa1PjwHA9voNa2Gx8KZ

claude added 5 commits March 6, 2026 09:07
- --scope フラグのカンマ区切りで空文字列をフィルタリング
- --list-file で空リストの場合にエラーメッセージを表示
- URL 引数を new URL() で検証し、無効な場合にエラーを表示

Closes #17

https://claude.ai/code/session_01JZboa1PjwHA9voNa2Gx8KZ
- validateUrls ヘルパーを抽出し、--list / --list-file / 単一URL引数の全パスで適用
- --list-file 内の無効URLを検出するテストを追加
- --list 内の無効URLを検出するテストを追加
- scope 全空文字列のエッジケーステストを追加

https://claude.ai/code/session_01JZboa1PjwHA9voNa2Gx8KZ
crawl コマンドの URL 引数にプロトコル付き完全形式が必要であることを明記。

https://claude.ai/code/session_01JZboa1PjwHA9voNa2Gx8KZ
エクスポートされたトップレベル定数に JSDoc が欠落していたため追記。

https://claude.ai/code/session_01JZboa1PjwHA9voNa2Gx8KZ
- titleOnly → metadataOnly に用語を修正(実装に合わせる)
- Archive.write() の説明に snapshot の zip 圧縮ステップを追記

https://claude.ai/code/session_01JZboa1PjwHA9voNa2Gx8KZ
@YusukeHirao YusukeHirao merged commit 1607981 into main Mar 9, 2026
3 checks passed
@YusukeHirao YusukeHirao deleted the claude/improve-crawl-validation-7YxEU branch March 9, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

crawl コマンドの入力バリデーション強化

2 participants