Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ packages/
```

> **Note**: CLI は analyze プラグインに直接依存する(`npx` 実行時のモジュール解決のため)。新規 analyze プラグイン追加時は `@nitpicker/cli/package.json` の `dependencies` にも追加すること。
>
> **Note**: `@d-zero/dealer` は上図では crawler と report-google-sheets への接続のみ表示しているが、cli と core も `Lanes` 型のインポートのために依存している。

---

Expand Down Expand Up @@ -142,6 +144,7 @@ crawler/src/
│ ├── destination-cache.ts # リクエストキャッシュ
│ ├── fetch-robots-txt.ts # robots.txt 取得・パース
│ ├── robots-checker.ts # robots.txt 準拠チェッカー(origin 別キャッシュ)
│ ├── format-crawl-progress.ts # deal() 進捗表示のフォーマッタ
│ └── ... # link-to-page-data, protocol-agnostic-key, net-timeout-error
├── crawler.ts # バレルエクスポート(パッケージ公開 API)
├── crawler-orchestrator.ts # CrawlerOrchestrator
Expand Down Expand Up @@ -326,7 +329,7 @@ scrapeStart(url, page, options)
### その他テーブル

- **images**: pageId, src, currentSrc, alt, width/height, naturalWidth/naturalHeight, isLazy, viewportWidth, sourceCode
- **resources**: url, isExternal, status, contentType, contentLength, compress, cdn, responseHeaders
- **resources**: url, isExternal, status, statusText, contentType, contentLength, compress, cdn, responseHeaders
- **resources-referrers**: resourceId → resources.id, pageId → pages.id
- **info**: 設定情報(単一レコード、`Config` 型のフィールドを JSON で保存)

Expand Down Expand Up @@ -647,7 +650,7 @@ Nitpicker は D-ZERO が公開する以下の外部パッケージに依存し
| パッケージ | 用途 | 検索キーワード |
| ----------------------- | ----------------------------------------------------------------------------- | ------------------------------------- |
| `@d-zero/beholder` | Puppeteer ベースのスクレイパーエンジン。`ScrapeResult` を返す | `"@d-zero/beholder" changelog` |
| `@d-zero/dealer` | 並列処理・スケジューリング。`deal()` 関数を提供 | `"@d-zero/dealer" deal concurrent` |
| `@d-zero/dealer` | 並列処理・スケジューリング。`deal()` 関数と `Lanes` 進捗表示を提供 | `"@d-zero/dealer" deal concurrent` |
| `@d-zero/shared` | 共有ユーティリティ(サブパスエクスポート形式: `@d-zero/shared/parse-url` 等) | `"@d-zero/shared" subpath exports` |
| `@d-zero/roar` | CLI フレームワーク | `"@d-zero/roar" command` |
| `@d-zero/google-auth` | OAuth2 認証(`credentials.json` → `token.json`) | `"@d-zero/google-auth" oauth2` |
Expand All @@ -659,7 +662,7 @@ Nitpicker は D-ZERO が公開する以下の外部パッケージに依存し

```
@d-zero/beholder → crawler(Scraper, ScrapeResult)
@d-zero/dealer → crawler, core, cli, report-google-sheets(deal() 並列制御
@d-zero/dealer → crawler(deal() 並列制御), core・clireport-google-sheets(Lanes 進捗表示
@d-zero/shared → 全パッケージ(parseUrl, delay, isError, detectCompress, detectCDN)
@d-zero/roar → cli(CLI コマンド定義)
@d-zero/google-auth → report-google-sheets(OAuth2 認証)
Expand All @@ -671,5 +674,5 @@ Nitpicker は D-ZERO が公開する以下の外部パッケージに依存し
### バージョン更新時の注意

- **`@d-zero/beholder`**: `ScrapeResult` の型が変わると crawler 全体に影響
- **`@d-zero/dealer`**: `deal()` の API が変わると crawler core の並列処理に影響
- **`@d-zero/dealer`**: `deal()` の API が変わると crawler の並列処理に影響。`Lanes` の型が変わると core・cli・report-google-sheets の進捗表示に影響
- **`@d-zero/shared`**: サブパスエクスポートの追加・削除に注意。`@d-zero/shared/parse-url` 形式でインポートすること
12 changes: 6 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ packages/
├── @nitpicker/
│ ├── cli/ # 統合 CLI (bin: nitpicker)
│ ├── crawler/ # クローラーエンジン(オーケストレーター + アーカイブ + ユーティリティ)
│ ├── core/ # 監査エンジン(Nitpicker クラス + deal() による並列処理)
│ ├── core/ # 監査エンジン(Nitpicker クラス + bounded Promise pool による並列処理)
│ ├── types/ # 監査型定義(Report, ConfigJSON)
│ ├── analyze-axe/ # アクセシビリティ監査
│ ├── analyze-lighthouse/ # Lighthouse 監査
Expand Down Expand Up @@ -71,9 +71,9 @@ CrawlerOrchestrator.crawling(urls, options)
Nitpicker.analyze(archivePath, plugins)
→ Archive.connect() → ArchiveAccessor
→ getPagesWithRefs() で全ページ取得
deal()(@d-zero/dealer, limit: 50)で並列分析
bounded Promise pool(limit: 50)で並列分析
→ 各 Page: runInWorker() で Worker スレッドでプラグイン実行
deal() が進捗表示を担当(プラグイン内の console.log は不要)
Lanes(@d-zero/dealer)が進捗表示を担当(プラグイン内の console.log は不要)
→ レポートファイル書き出し
```

Expand All @@ -86,10 +86,10 @@ Nitpicker.analyze(archivePath, plugins)
- `delay` — `@d-zero/shared/delay`
- `isError` — beholder/is-error.ts に集約、crawler は re-export

### deal() の利用箇所
### deal() / 並列処理の利用箇所

- **crawler**: URL スクレイピングの並列制御
- **core(analyze)**: ページ分析の並列処理(limit: 50)
- **crawler**: `deal()`(@d-zero/dealer)による URL スクレイピングの並列制御
- **core(analyze)**: 独自の bounded Promise pool(limit: 50)による並列処理。`Lanes`(@d-zero/dealer)で進捗表示

## テスト

Expand Down
2 changes: 1 addition & 1 deletion packages/@nitpicker/analyze-search/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"clean": "tsc --build --clean"
},
"dependencies": {
"@d-zero/shared": "0.20.0",
"@d-zero/shared": "0.20.1",
"@nitpicker/core": "0.4.4",
"@nitpicker/crawler": "0.4.4",
"@nitpicker/types": "0.4.4",
Expand Down
4 changes: 2 additions & 2 deletions packages/@nitpicker/cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
"clean": "tsc --build --clean"
},
"dependencies": {
"@d-zero/dealer": "1.6.3",
"@d-zero/dealer": "1.7.0",
"@d-zero/readtext": "1.1.19",
"@d-zero/roar": "2.0.0",
"@d-zero/shared": "0.20.0",
"@d-zero/shared": "0.20.1",
"@nitpicker/analyze-axe": "0.4.4",
"@nitpicker/analyze-lighthouse": "0.4.4",
"@nitpicker/analyze-main-contents": "0.4.4",
Expand Down
4 changes: 2 additions & 2 deletions packages/@nitpicker/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@
"clean": "tsc --build --clean"
},
"dependencies": {
"@d-zero/dealer": "1.6.3",
"@d-zero/shared": "0.20.0",
"@d-zero/dealer": "1.7.0",
"@d-zero/shared": "0.20.1",
"@nitpicker/crawler": "0.4.4",
"@nitpicker/types": "0.4.4",
"ansi-colors": "4.1.3",
Expand Down
4 changes: 2 additions & 2 deletions packages/@nitpicker/crawler/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@
},
"dependencies": {
"@d-zero/beholder": "2.0.0",
"@d-zero/dealer": "1.6.3",
"@d-zero/dealer": "1.7.0",
"@d-zero/fs": "0.2.2",
"@d-zero/shared": "0.20.0",
"@d-zero/shared": "0.20.1",
"ansi-colors": "4.1.3",
"debug": "4.4.3",
"follow-redirects": "1.15.11",
Expand Down
238 changes: 238 additions & 0 deletions packages/@nitpicker/crawler/src/crawler/crawler.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
import type { CrawlerEventTypes } from './types.js';

import { tryParseUrl as parseUrl } from '@d-zero/shared/parse-url';
import { describe, it, expect, vi, beforeEach } from 'vitest';

vi.mock('@d-zero/dealer', () => ({
deal: vi.fn(),
}));

vi.mock('@d-zero/shared/retry', () => ({
/**
* Stub retryCall that calls the function once without retries.
* @param fn - The function to call.
* @returns The result of calling fn.
*/
retryCall: (fn: () => unknown) => fn(),
}));

vi.mock('./robots-checker.js', () => {
/**
* Stub RobotsChecker that always allows crawling.
*/
class RobotsCheckerStub {
/**
* Always returns true.
* @returns Resolved with true.
*/
isAllowed() {
return Promise.resolve(true);
}
}
return { RobotsChecker: RobotsCheckerStub };
});

/**
* Default crawler options for testing.
*/
const defaultOptions = {
interval: 0,
parallels: 1,
recursive: true,
scope: ['https://example.com/'],
excludes: [],
excludeKeywords: [],
excludeUrls: [],
ignoreRobots: true,
};

describe('Crawler', () => {
beforeEach(() => {
vi.resetAllMocks();
});

describe('#emitDealErrors via start()', () => {
it('AggregateError の各エラーが個別の error イベントとして emit される', async () => {
const { deal } = await import('@d-zero/dealer');
const { default: Crawler } = await import('./crawler.js');

vi.mocked(deal).mockRejectedValue(
new AggregateError(
[new Error('worker-1 failed'), new Error('worker-2 failed')],
'deal failed',
),
);

const crawler = new Crawler(defaultOptions);
const errors: CrawlerEventTypes['error'][] = [];
crawler.on('error', (e) => {
errors.push(e);
});

const url = parseUrl('https://example.com/')!;
crawler.start(url);

// deal() rejection triggers async .catch — wait for microtask queue
await vi.waitFor(() => {
expect(errors).toHaveLength(2);
});

expect(errors[0]!.error.message).toBe('worker-1 failed');
expect(errors[1]!.error.message).toBe('worker-2 failed');
expect(errors[0]!.url).toBe('https://example.com');
expect(errors[0]!.isExternal).toBe(false);
expect(errors[0]!.isMainProcess).toBe(true);
});

it('AggregateError 内の非 Error 値が Error に変換される', async () => {
const { deal } = await import('@d-zero/dealer');
const { default: Crawler } = await import('./crawler.js');

vi.mocked(deal).mockRejectedValue(
new AggregateError(['string error', 42], 'mixed errors'),
);

const crawler = new Crawler(defaultOptions);
const errors: CrawlerEventTypes['error'][] = [];
crawler.on('error', (e) => {
errors.push(e);
});

crawler.start(parseUrl('https://example.com/')!);

await vi.waitFor(() => {
expect(errors).toHaveLength(2);
});

expect(errors[0]!.error).toBeInstanceOf(Error);
expect(errors[0]!.error.message).toBe('string error');
expect(errors[1]!.error).toBeInstanceOf(Error);
expect(errors[1]!.error.message).toBe('42');
});

it('通常の Error は単一の error イベントとして emit される', async () => {
const { deal } = await import('@d-zero/dealer');
const { default: Crawler } = await import('./crawler.js');

vi.mocked(deal).mockRejectedValue(new Error('deal failed'));

const crawler = new Crawler(defaultOptions);
const errors: CrawlerEventTypes['error'][] = [];
crawler.on('error', (e) => {
errors.push(e);
});

crawler.start(parseUrl('https://example.com/')!);

await vi.waitFor(() => {
expect(errors).toHaveLength(1);
});

expect(errors[0]!.error.message).toBe('deal failed');
});

it('deal 失敗後に crawlEnd イベントが emit される', async () => {
const { deal } = await import('@d-zero/dealer');
const { default: Crawler } = await import('./crawler.js');

vi.mocked(deal).mockRejectedValue(new Error('fatal'));

const crawler = new Crawler(defaultOptions);
let crawlEndEmitted = false;
crawler.on('crawlEnd', () => {
crawlEndEmitted = true;
});

crawler.start(parseUrl('https://example.com/')!);

await vi.waitFor(() => {
expect(crawlEndEmitted).toBe(true);
});
});
});

describe('#emitDealErrors via startMultiple()', () => {
it('AggregateError の各エラーが個別に emit される', async () => {
const { deal } = await import('@d-zero/dealer');
const { default: Crawler } = await import('./crawler.js');

vi.mocked(deal).mockRejectedValue(
new AggregateError(
[new Error('err-a'), new Error('err-b'), new Error('err-c')],
'deal failed',
),
);

const crawler = new Crawler(defaultOptions);
const errors: CrawlerEventTypes['error'][] = [];
crawler.on('error', (e) => {
errors.push(e);
});

const urls = [
parseUrl('https://example.com/page1')!,
parseUrl('https://example.com/page2')!,
];
crawler.startMultiple(urls);

await vi.waitFor(() => {
expect(errors).toHaveLength(3);
});

expect(errors[0]!.url).toBe('https://example.com/page1');
expect(errors[0]!.error.message).toBe('err-a');
expect(errors[1]!.error.message).toBe('err-b');
expect(errors[2]!.error.message).toBe('err-c');
});
});

describe('worker-level error handling', () => {
it('ワーカー内の例外が error イベントとして emit され処理が継続する', async () => {
const { deal } = await import('@d-zero/dealer');
const { default: Crawler } = await import('./crawler.js');

const workerError = new Error('unexpected crash');

// Simulate deal: call setup function, then invoke the returned work function
vi.mocked(deal).mockImplementation(async (items, factory) => {
for (const [index, item] of (items as unknown[]).entries()) {
const noop = () => {};
const noopAsync = async () => {};
// eslint-disable-next-line @typescript-eslint/no-unsafe-function-type -- deal factory signature is complex; cast is intentional in test
const workFn = (factory as Function)(item, noop, index, noop, noopAsync) as
| (() => Promise<void>)
| undefined;
if (workFn) {
await workFn();
}
}
});

// Mock fetchDestination to throw — triggers the worker catch block
const fetchDestMod = await import('./fetch-destination.js');
vi.spyOn(fetchDestMod, 'fetchDestination').mockRejectedValue(workerError);

const crawler = new Crawler(defaultOptions);

const errors: CrawlerEventTypes['error'][] = [];
crawler.on('error', (e) => {
errors.push(e);
});

let crawlEndEmitted = false;
crawler.on('crawlEnd', () => {
crawlEndEmitted = true;
});

crawler.start(parseUrl('https://example.com/')!);

await vi.waitFor(() => {
expect(crawlEndEmitted).toBe(true);
});

expect(errors).toHaveLength(1);
expect(errors[0]!.error.message).toBe('unexpected crash');
expect(errors[0]!.url).toBe('https://example.com');
});
});
});
Loading
Loading