feat(core): add RetryManager state machine for TAPI backoff/retry#1158
Merged
feat(core): add RetryManager state machine for TAPI backoff/retry#1158
Conversation
2 tasks
bb983fd to
7a0b957
Compare
3371651 to
0c85b0f
Compare
7a0b957 to
f631f9f
Compare
0c85b0f to
8f21c88
Compare
f631f9f to
024a1a2
Compare
8f21c88 to
225e64a
Compare
024a1a2 to
6d30565
Compare
225e64a to
f51911b
Compare
6d30565 to
50cad97
Compare
f51911b to
fcdc491
Compare
50cad97 to
04bb992
Compare
a8c9435 to
ad417e4
Compare
658e649 to
6695f01
Compare
ad417e4 to
06a5962
Compare
6695f01 to
3a16620
Compare
2539c34 to
fa31a9c
Compare
f3e71c3 to
55221f4
Compare
2d313d8 to
89ce849
Compare
09d48d7 to
a6127cc
Compare
89ce849 to
548872e
Compare
a6127cc to
f76fd78
Compare
548872e to
4f8ea2f
Compare
f76fd78 to
e1fb9cd
Compare
dbe8e59 to
fd41e95
Compare
e1fb9cd to
d0f997b
Compare
4a96660 to
31f8677
Compare
Add RetryManager with three states (READY, RATE_LIMITED, BACKING_OFF) that handles 429 rate limiting with Retry-After parsing and transient error exponential backoff with jitter. Includes sovran-based state persistence and configurable retry strategies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove early return in handle429 when in BACKING_OFF state. 429 is a server-explicit signal that should always take precedence over transient backoff. - Change jitter from ±jitterPercent to additive-only (0 to jitterPercent) per SDD specification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Validate persisted state in canRetry() to handle clock changes/corruption per SDD §Metadata Lifecycle - Move backoff calculation inside dispatch to avoid stale retryCount from concurrent batch failures (handleErrorWithBackoff) - Ensure RATE_LIMITED state is never downgraded to BACKING_OFF - Update reset() docstring to clarify when it should be called Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify class docstring to describe current architecture without referencing SDD deviations. Remove redundant inline comments, compact JSDoc to single-line where appropriate, and ensure all comments use present tense. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a 429 arrives while in BACKING_OFF state, use the server's Retry-After directly instead of applying the lazy/eager strategy. The server's timing signal is authoritative over calculated backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gnaling
- Use getState(true) for queue-safe reads to prevent race conditions
between concurrent canRetry/handle429/handleTransientError calls
- Consolidate handleError and handleErrorWithBackoff into a single
method that accepts a computeWaitUntilTime function
- Extract side effects (logging, Math.random) from dispatch reducers
- Return RetryResult ('rate_limited'|'backed_off'|'limit_exceeded')
from handle429/handleTransientError so callers can drop events on
limit exceeded
- Clear auto-flush timer in transitionToReady
- Validate state string in isPersistedStateValid
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TimerFlushPolicy already drives periodic flushes; the auto-flush timer in RetryManager was redundant. Removes setAutoFlushCallback, scheduleAutoFlush, clearAutoFlushTimer, and the destroy() method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
31f8677 to
72f014c
Compare
The retryStrategy parameter was previously removed but accidentally re-added. Eager behavior (take shorter wait when consolidating concurrent errors) is now hardcoded as the only strategy. - Remove retryStrategy field and constructor parameter - Replace applyRetryStrategy() with Math.min (eager behavior) - Update class documentation to reflect hardcoded strategy Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
| const waitUntilTime = computeWaitUntilTime(state); | ||
|
|
||
| const resolvedState = | ||
| state.state === 'RATE_LIMITED' && newState === 'BACKING_OFF' |
There was a problem hiding this comment.
Should we use a const to define these? we seem to use these a bunch?
Extract store creation logic into createStore() helper method and error message extraction into getErrorMessage() utility. Benefits: - Constructor now just assigns fields and delegates to helper - No nested try-catch blocks (linear flow) - Error message formatting is DRY - Easier to understand and maintain Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace string literals with TypeScript enums: - RetryState enum (READY, RATE_LIMITED, BACKING_OFF) - RetryResult enum (RATE_LIMITED, BACKED_OFF, LIMIT_EXCEEDED) Extract helper methods for clarity: - resolveStatePrecedence(): handles 429 taking priority over backoff - consolidateWaitTime(): uses switch statement for clear wait time logic - getStateDisplayName(): maps state to display names Benefits: - Type-safe state handling (no magic strings) - Switch statements make control flow explicit - Each helper method has a single, named responsibility - Easier to test and maintain Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Extract retry-after clamping into clampRetryAfter method - Extract state computation into computeNewState method - Add stateToResult helper to map state to result enum - Break up isPersistedStateValid into focused validation methods - Eliminate nested conditionals and improve single responsibility
Use Object.values(RetryState).includes() instead of maintaining a duplicate Set of valid states. More idiomatic TypeScript and eliminates maintenance burden of keeping Set in sync with enum.
didiergarcia
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RetryManagerclass with three states:READY,RATE_LIMITED,BACKING_OFFRetry-Afterheader parsing and configurable max intervalsgetState(true)(queue-safe) for all state reads to prevent race conditions between concurrentcanRetry/handle429/handleTransientErrorcallshandleError/handleErrorWithBackoffinto a single unified method withcomputeWaitUntilTimefunction parameter — eliminates duplicated logicMath.random) extracted from dispatch reducers for purityRetryResulttype ('rate_limited' | 'backed_off' | 'limit_exceeded') fromhandle429/handleTransientErrorso callers can detect when retry limits are exceededtransitionToReadyclears state when wait period expiresisPersistedStateValidvalidates state string against known valuesPR 3 of 5 in the TAPI backoff/retry stack. Depends on #1157. Tests in #1159.
Test plan
🤖 Generated with Claude Code