fix(runtime): align Node warmup protocol and fail fast#196
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughWarmup command handling was formalized into a strict WarmupCommand shape with normalization and numeric request IDs; warmup messaging now uses PROTOCOL_ID and robust error wrapping. WorkerPool initialization was hardened to dispose partially-initialized workers on warmup failure. Tests updated to cover new formats and failure cases. Changes
Sequence DiagramsequenceDiagram
participant Client
participant NodeBridge
participant Normalizer
participant Worker
participant Transport
Client->>NodeBridge: initialize({ warmupCommands })
NodeBridge->>Normalizer: normalize(warmupCommands)
Normalizer-->>NodeBridge: WarmupCommand[]
NodeBridge->>Worker: create + onWorkerReady()
Worker->>Transport: for each WarmupCommand -> send PROTOCOL_ID (numeric reqId)
Transport-->>Worker: response / error
alt all commands succeed
Worker-->>NodeBridge: ready
NodeBridge->>NodeBridge: push worker to pool
NodeBridge-->>Client: initialized OK
else any command fails
Worker-->>NodeBridge: BridgeExecutionError / protocol error
NodeBridge->>Transport: dispose()
NodeBridge-->>Client: initialization rejected
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a06067b2b2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/runtime/node.ts
Outdated
| const warmups = commands ?? []; | ||
| return warmups.map((command, index) => { |
There was a problem hiding this comment.
Validate warmupCommands is an array before mapping
normalizeWarmupCommands assumes commands has .map(), so a JS/config caller that passes a non-array value (for example a single object) now gets a raw TypeError (warmups.map is not a function) before any of the new protocol validation/migration messaging runs. This turns a user-input validation case into an opaque runtime crash and bypasses the explicit BridgeProtocolError paths added in this commit.
Useful? React with 👍 / 👎.
src/runtime/node.ts
Outdated
| const message = JSON.stringify({ | ||
| id: generateWarmupId(), |
There was a problem hiding this comment.
Wrap warmup serialization failures with command context
The warmup payload is serialized before entering the guarded send/parse error handling, so unsupported argument shapes (e.g. BigInt or circular values in args) cause JSON.stringify to throw a bare TypeError without the command index/module/function context this change is trying to provide. This makes warmup failures harder to diagnose and inconsistent with the new fail-fast error reporting.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/runtime/node.ts`:
- Around line 550-599: The warmup loop around warmupCommands currently builds
the request with JSON.stringify and treats any JSON response without an "error"
key as success; wrap the message construction in a try/catch and throw a
BridgeExecutionError with the same "Warmup command `#N` (module.function)" context
on JSON encoding failures (the message variable created from
generateWarmupId()/PROTOCOL_ID/worker.transport.send), and after parsing the
response validate that a proper success envelope exists (e.g. the parsed object
either contains an expected "result" or explicit success marker) — if neither
"error" nor the expected success field is present, throw a BridgeExecutionError
for a malformed success payload referencing warmupCommands[index], commandLabel
and the request id; keep using BridgeExecutionError for all wrapped failures so
send, encode, decode, and malformed-success cases include the command context.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
src/runtime/node.tssrc/runtime/worker-pool.tstest/optimized-node.test.tstest/runtime_node.test.tstest/worker-pool.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: os (windows-latest)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-263
Timestamp: 2026-01-19T21:14:40.872Z
Learning: In `src/runtime/bridge-core.ts` and similar hot request/response loop implementations in the tywrap repository, avoid adding extra defensive validation (e.g., runtime shape checks on error payloads) in tight loops unless the protocol boundary is untrusted or there's a concrete bug report. The Python bridge protocol is controlled and validated via tests, so defensive checks would add unnecessary branching overhead without meaningful benefit.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 136
File: src/runtime/node.ts:238-249
Timestamp: 2026-01-20T01:33:12.841Z
Learning: In the tywrap repository, when reviewing init/startup patterns, prioritize cases that break core functionality over edge cases with diagnostic-only impact (e.g., missing bridgeInfo metadata). If workers are healthy and operational, defer low-impact defensive checks to follow-up PRs.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:599-602
Timestamp: 2026-01-20T16:01:14.323Z
Learning: In `src/runtime/node-bridge.ts` (NodeBridge), a test message is sent to the Python subprocess to confirm the bridge is responsive before marking it as ready.
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.
📚 Learning: 2026-01-20T01:34:07.064Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 136
File: src/runtime/node.ts:444-458
Timestamp: 2026-01-20T01:34:07.064Z
Learning: When reviewing promise-based polling patterns (e.g., recursive setTimeout in waitForAvailableWorker functions), ensure that any recursive timer is tracked and cleared on timeout or promise resolution to prevent timer leaks. Do not rely on no-op checks after settlement; verify clearTimeout is invoked in all paths and consider using an explicit cancellation (e.g., AbortController) or a guard to cancel pending timers when the promise settles.
Applied to files:
test/worker-pool.test.tssrc/runtime/worker-pool.tstest/optimized-node.test.tstest/runtime_node.test.tssrc/runtime/node.ts
📚 Learning: 2026-01-19T21:00:30.825Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: test/adversarial_playground.test.ts:339-360
Timestamp: 2026-01-19T21:00:30.825Z
Learning: In adversarial tests for the bridge (test/adversarial_playground.test.ts), recovery assertions should be kept in the finally block adjacent to cleanup to verify resilience under stress, even if that means post-timeout checks might mask the original error. The goal is to surface recovery failures as legitimate test failures.
Applied to files:
test/worker-pool.test.tstest/optimized-node.test.tstest/runtime_node.test.ts
📚 Learning: 2026-01-19T21:48:27.823Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: test/runtime_bridge_fixtures.test.ts:59-66
Timestamp: 2026-01-19T21:48:27.823Z
Learning: In fixture-based tests (e.g., test/runtime_bridge_fixtures.test.ts) and similar tests in the tywrap repository, prefer early returns when Python or fixture files are unavailable. Do not rely on Vitest dynamic skip APIs; a silent pass is intentional for environments lacking Python/fixtures. Treat missing fixtures as optional soft-guards and ensure the test remains non-disruptive in non-availability scenarios.
Applied to files:
test/worker-pool.test.tstest/optimized-node.test.tstest/runtime_node.test.ts
📚 Learning: 2026-01-20T18:37:05.670Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: src/runtime/process-io.ts:37-40
Timestamp: 2026-01-20T18:37:05.670Z
Learning: In the tywrap repo, ESLint is used for linting (not Biome). Do not flag regex literals that contain control characters (e.g., \u001b, \u0000-\u001F) as lint errors, since the current ESLint configuration allows them. When reviewing TypeScript files (e.g., src/**/*.ts), rely on ESLint results and avoid raising issues about these specific control-character regex literals unless there is a competing config change or a policy requiring explicit escaping.
Applied to files:
test/worker-pool.test.tssrc/runtime/worker-pool.tstest/optimized-node.test.tstest/runtime_node.test.tssrc/runtime/node.ts
📚 Learning: 2026-01-20T16:00:49.829Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:203-211
Timestamp: 2026-01-20T16:00:49.829Z
Learning: In the BridgeProtocol implementation (tywrap), reject Map and Set explicitly before serialization (e.g., in safeStringify or the serialization entrypoint) with a clear error like "Bridge protocol does not support Map/Set values". Do not rely on post-hoc checks of non-string keys at the point of JSON.stringify, since Maps/Sets cannot round-trip. This should be enforced at the boundary where data is prepared for serialization to ensure deterministic errors and prevent invalid data from propagating.
Applied to files:
test/worker-pool.test.tssrc/runtime/worker-pool.tstest/optimized-node.test.tstest/runtime_node.test.tssrc/runtime/node.ts
📚 Learning: 2026-01-19T21:14:29.869Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:29.869Z
Learning: In the runtime env-var parsing (e.g., in src/runtime/bridge-core.ts and similar modules), adopt a tolerant, best-effort policy: numeric env values should parse by taking the leading numeric portion (e.g., TYWRAP_CODEC_MAX_BYTES=1024abc -> 1024). Only reject clearly invalid values (non-numeric start or <= 0). This reduces surprising failures from minor typos. Add tests to cover partial numeric prefixes, and ensure downstream logic documents the trusted range; consider a small helper to extract a positive integer from a string with a safe fallback.
Applied to files:
src/runtime/worker-pool.tssrc/runtime/node.ts
📚 Learning: 2026-01-19T21:14:35.390Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-263
Timestamp: 2026-01-19T21:14:35.390Z
Learning: In src/runtime/bridge-core.ts and similar hot request/response loop implementations in the tywrap repository, avoid adding extra defensive validation (such as runtime, shape, or error-payload checks) inside tight loops unless the protocol boundary is untrusted or there is a concrete bug report. The Python bridge protocol is controlled via tests, so these checks add unnecessary branching overhead without meaningful benefit. Apply this guidance to other hot-path runtime loops in src/runtime/**/*.ts, and re-enable additional validations only when a documented risk or failure scenario is identified. Ensure tests cover protocol validation where applicable.
Applied to files:
src/runtime/worker-pool.tssrc/runtime/node.ts
📚 Learning: 2026-01-19T21:48:45.693Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-298
Timestamp: 2026-01-19T21:48:45.693Z
Learning: In `src/runtime/bridge-core.ts`, keep `normalizeErrorPayload` to validate error payloads from the Python subprocess. The subprocess boundary is effectively untrusted, and normalizing error responses prevents `undefined: undefined` errors on malformed payloads. Error responses are not the hot path, so the small validation overhead is acceptable for the added resilience.
Applied to files:
test/optimized-node.test.tstest/runtime_node.test.tssrc/runtime/node.ts
📚 Learning: 2026-01-19T21:14:40.872Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:260-263
Timestamp: 2026-01-19T21:14:40.872Z
Learning: In `src/runtime/bridge-core.ts` and similar hot request/response loop implementations in the tywrap repository, avoid adding extra defensive validation (e.g., runtime shape checks on error payloads) in tight loops unless the protocol boundary is untrusted or there's a concrete bug report. The Python bridge protocol is controlled and validated via tests, so defensive checks would add unnecessary branching overhead without meaningful benefit.
Applied to files:
test/optimized-node.test.tstest/runtime_node.test.ts
📚 Learning: 2026-01-20T16:01:14.323Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:599-602
Timestamp: 2026-01-20T16:01:14.323Z
Learning: In `src/runtime/node-bridge.ts` (NodeBridge), a test message is sent to the Python subprocess to confirm the bridge is responsive before marking it as ready.
Applied to files:
test/optimized-node.test.tstest/runtime_node.test.tssrc/runtime/node.ts
📚 Learning: 2026-01-19T21:14:37.032Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 127
File: src/runtime/bridge-core.ts:375-385
Timestamp: 2026-01-19T21:14:37.032Z
Learning: In tywrap (src/runtime/bridge-core.ts and similar), environment variable parsing follows a tolerant/best-effort policy. For example, `TYWRAP_CODEC_MAX_BYTES=1024abc` should be accepted as 1024. Only reject clearly invalid values (non-numeric start or <=0). This avoids surprising failures from minor typos.
Applied to files:
test/optimized-node.test.ts
📚 Learning: 2026-01-20T16:01:14.323Z
Learnt from: bbopen
Repo: bbopen/tywrap PR: 152
File: docs/adr/002-bridge-protocol.md:599-602
Timestamp: 2026-01-20T16:01:14.323Z
Learning: In the tywrap repository, bridges currently use implicit ready detection: the first successful response from the Python subprocess proves the bridge is ready to handle requests.
Applied to files:
test/runtime_node.test.ts
🧬 Code graph analysis (3)
test/optimized-node.test.ts (1)
src/runtime/errors.ts (1)
BridgeProtocolError(13-13)
test/runtime_node.test.ts (1)
src/runtime/node.ts (1)
NodeBridge(279-525)
src/runtime/node.ts (3)
src/runtime/errors.ts (2)
BridgeProtocolError(13-13)BridgeExecutionError(31-33)src/runtime/worker-pool.ts (1)
PooledWorker(47-53)src/runtime/transport.ts (1)
PROTOCOL_ID(20-20)
🔇 Additional comments (5)
src/runtime/worker-pool.ts (1)
412-426: Good warmup-failure cleanup in worker creation.Transport disposal on
onWorkerReadyfailure plus deferringthis.workers.push(worker)until success correctly prevents retaining partially initialized workers.src/runtime/node.ts (1)
21-25: Warmup normalization and constructor wiring look solid.The new normalization path is fail-fast, gives explicit migration guidance for legacy warmup shape, and cleanly feeds a strict internal warmup command model into initialization.
Also applies to: 123-130, 198-236, 298-315, 535-546
test/runtime_node.test.ts (1)
9-9: Warmup integration tests are strong and targeted.The new cases validate both successful per-worker warmup and failure surfacing/readiness behavior, with solid cleanup in
finally.Also applies to: 706-795
test/worker-pool.test.ts (1)
289-307: Great failure-path regression test for worker warmup.This test captures the key contract: reject acquire, dispose transport, and keep
workerCountat zero whenonWorkerReadyfails.test/optimized-node.test.ts (1)
7-7: Constructor tests correctly enforce the warmup shape transition.Accepting
{ module, functionName, args }while rejecting legacy{ method, params }withBridgeProtocolErroris exactly the right compatibility posture for this PR.Also applies to: 74-87
Superseded by commit 987c7e8; all review findings are addressed.
987c7e8 to
05331c7
Compare
Summary\n- send warmup commands with the current protocol envelope (protocol/method/params + numeric id)\n- reject legacy warmup shape
{ method, params }with explicit migration guidance\n- surface warmup send/response failures with command index + module/function context\n- ensure WorkerPool disposes transports and does not retain workers when warmup fails\n\n## Validation\n- npx vitest --run test/runtime_node.test.ts test/optimized-node.test.ts test/worker-pool.test.ts