Skip to content

feat(core): add RetryManager state machine for TAPI backoff/retry#1158

Merged
abueide merged 14 commits intomasterfrom
tapi/retry-manager
Mar 24, 2026
Merged

feat(core): add RetryManager state machine for TAPI backoff/retry#1158
abueide merged 14 commits intomasterfrom
tapi/retry-manager

Conversation

@abueide
Copy link
Copy Markdown
Contributor

@abueide abueide commented Mar 10, 2026

Summary

  • Add RetryManager class with three states: READY, RATE_LIMITED, BACKING_OFF
  • Handles 429 rate limiting with Retry-After header parsing and configurable max intervals
  • Implements exponential backoff with jitter for transient errors (5xx)
  • Uses sovran store for state persistence across app restarts with validation on restore
  • Uses eager retry strategy (take shorter wait) when consolidating concurrent batch failures
  • Uses getState(true) (queue-safe) for all state reads to prevent race conditions between concurrent canRetry/handle429/handleTransientError calls
  • Consolidated handleError/handleErrorWithBackoff into a single unified method with computeWaitUntilTime function parameter — eliminates duplicated logic
  • Side effects (logging, Math.random) extracted from dispatch reducers for purity
  • Returns RetryResult type ('rate_limited' | 'backed_off' | 'limit_exceeded') from handle429/handleTransientError so callers can detect when retry limits are exceeded
  • transitionToReady clears state when wait period expires
  • isPersistedStateValid validates state string against known values
  • Add barrel export and test helper utilities

PR 3 of 5 in the TAPI backoff/retry stack. Depends on #1157. Tests in #1159.

Test plan

🤖 Generated with Claude Code

@abueide abueide force-pushed the tapi/config-and-settings branch 2 times, most recently from bb983fd to 7a0b957 Compare March 12, 2026 14:57
@abueide abueide force-pushed the tapi/retry-manager branch 2 times, most recently from 3371651 to 0c85b0f Compare March 12, 2026 14:57
@abueide abueide force-pushed the tapi/config-and-settings branch from 7a0b957 to f631f9f Compare March 12, 2026 15:24
@abueide abueide force-pushed the tapi/retry-manager branch from 0c85b0f to 8f21c88 Compare March 12, 2026 15:30
@abueide abueide force-pushed the tapi/config-and-settings branch from f631f9f to 024a1a2 Compare March 12, 2026 16:11
@abueide abueide force-pushed the tapi/retry-manager branch from 8f21c88 to 225e64a Compare March 12, 2026 16:11
@abueide abueide force-pushed the tapi/config-and-settings branch from 024a1a2 to 6d30565 Compare March 12, 2026 16:40
@abueide abueide force-pushed the tapi/retry-manager branch from 225e64a to f51911b Compare March 12, 2026 16:40
@abueide abueide force-pushed the tapi/config-and-settings branch from 6d30565 to 50cad97 Compare March 12, 2026 16:48
@abueide abueide force-pushed the tapi/retry-manager branch from f51911b to fcdc491 Compare March 12, 2026 16:48
@abueide abueide force-pushed the tapi/config-and-settings branch from 50cad97 to 04bb992 Compare March 12, 2026 17:38
@abueide abueide force-pushed the tapi/retry-manager branch 2 times, most recently from a8c9435 to ad417e4 Compare March 18, 2026 21:14
@abueide abueide force-pushed the tapi/config-and-settings branch 2 times, most recently from 658e649 to 6695f01 Compare March 18, 2026 22:12
@abueide abueide force-pushed the tapi/retry-manager branch from ad417e4 to 06a5962 Compare March 18, 2026 22:12
@abueide abueide force-pushed the tapi/config-and-settings branch from 6695f01 to 3a16620 Compare March 18, 2026 22:32
@abueide abueide force-pushed the tapi/retry-manager branch 2 times, most recently from 2539c34 to fa31a9c Compare March 19, 2026 16:02
@abueide abueide force-pushed the tapi/config-and-settings branch 2 times, most recently from f3e71c3 to 55221f4 Compare March 19, 2026 16:15
@abueide abueide force-pushed the tapi/retry-manager branch 2 times, most recently from 2d313d8 to 89ce849 Compare March 19, 2026 17:03
@abueide abueide force-pushed the tapi/config-and-settings branch 2 times, most recently from 09d48d7 to a6127cc Compare March 19, 2026 17:56
@abueide abueide force-pushed the tapi/retry-manager branch from 89ce849 to 548872e Compare March 19, 2026 17:56
@abueide abueide force-pushed the tapi/config-and-settings branch from a6127cc to f76fd78 Compare March 19, 2026 18:21
@abueide abueide force-pushed the tapi/retry-manager branch from 548872e to 4f8ea2f Compare March 19, 2026 18:22
@abueide abueide force-pushed the tapi/config-and-settings branch from f76fd78 to e1fb9cd Compare March 19, 2026 18:29
@abueide abueide force-pushed the tapi/retry-manager branch 2 times, most recently from dbe8e59 to fd41e95 Compare March 23, 2026 15:59
@abueide abueide force-pushed the tapi/config-and-settings branch from e1fb9cd to d0f997b Compare March 23, 2026 15:59
@abueide abueide force-pushed the tapi/retry-manager branch 2 times, most recently from 4a96660 to 31f8677 Compare March 23, 2026 18:24
Base automatically changed from tapi/config-and-settings to master March 23, 2026 19:00
abueide and others added 9 commits March 23, 2026 14:02
Add RetryManager with three states (READY, RATE_LIMITED, BACKING_OFF)
that handles 429 rate limiting with Retry-After parsing and transient
error exponential backoff with jitter. Includes sovran-based state
persistence and configurable retry strategies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove early return in handle429 when in BACKING_OFF state. 429 is a
  server-explicit signal that should always take precedence over transient
  backoff.
- Change jitter from ±jitterPercent to additive-only (0 to jitterPercent)
  per SDD specification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Validate persisted state in canRetry() to handle clock changes/corruption
  per SDD §Metadata Lifecycle
- Move backoff calculation inside dispatch to avoid stale retryCount from
  concurrent batch failures (handleErrorWithBackoff)
- Ensure RATE_LIMITED state is never downgraded to BACKING_OFF
- Update reset() docstring to clarify when it should be called

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify class docstring to describe current architecture without
referencing SDD deviations. Remove redundant inline comments, compact
JSDoc to single-line where appropriate, and ensure all comments use
present tense.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a 429 arrives while in BACKING_OFF state, use the server's
Retry-After directly instead of applying the lazy/eager strategy.
The server's timing signal is authoritative over calculated backoff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gnaling

- Use getState(true) for queue-safe reads to prevent race conditions
  between concurrent canRetry/handle429/handleTransientError calls
- Consolidate handleError and handleErrorWithBackoff into a single
  method that accepts a computeWaitUntilTime function
- Extract side effects (logging, Math.random) from dispatch reducers
- Return RetryResult ('rate_limited'|'backed_off'|'limit_exceeded')
  from handle429/handleTransientError so callers can drop events on
  limit exceeded
- Clear auto-flush timer in transitionToReady
- Validate state string in isPersistedStateValid

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TimerFlushPolicy already drives periodic flushes; the auto-flush
timer in RetryManager was redundant. Removes setAutoFlushCallback,
scheduleAutoFlush, clearAutoFlushTimer, and the destroy() method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abueide abueide force-pushed the tapi/retry-manager branch from 31f8677 to 72f014c Compare March 23, 2026 19:02
The retryStrategy parameter was previously removed but accidentally
re-added. Eager behavior (take shorter wait when consolidating
concurrent errors) is now hardcoded as the only strategy.

- Remove retryStrategy field and constructor parameter
- Replace applyRetryStrategy() with Math.min (eager behavior)
- Update class documentation to reflect hardcoded strategy

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
const waitUntilTime = computeWaitUntilTime(state);

const resolvedState =
state.state === 'RATE_LIMITED' && newState === 'BACKING_OFF'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a const to define these? we seem to use these a bunch?

abueide and others added 4 commits March 23, 2026 14:47
Extract store creation logic into createStore() helper method and
error message extraction into getErrorMessage() utility.

Benefits:
- Constructor now just assigns fields and delegates to helper
- No nested try-catch blocks (linear flow)
- Error message formatting is DRY
- Easier to understand and maintain

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace string literals with TypeScript enums:
- RetryState enum (READY, RATE_LIMITED, BACKING_OFF)
- RetryResult enum (RATE_LIMITED, BACKED_OFF, LIMIT_EXCEEDED)

Extract helper methods for clarity:
- resolveStatePrecedence(): handles 429 taking priority over backoff
- consolidateWaitTime(): uses switch statement for clear wait time logic
- getStateDisplayName(): maps state to display names

Benefits:
- Type-safe state handling (no magic strings)
- Switch statements make control flow explicit
- Each helper method has a single, named responsibility
- Easier to test and maintain

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Extract retry-after clamping into clampRetryAfter method
- Extract state computation into computeNewState method
- Add stateToResult helper to map state to result enum
- Break up isPersistedStateValid into focused validation methods
- Eliminate nested conditionals and improve single responsibility
Use Object.values(RetryState).includes() instead of maintaining a
duplicate Set of valid states. More idiomatic TypeScript and eliminates
maintenance burden of keeping Set in sync with enum.
@abueide abueide merged commit 1f56a4d into master Mar 24, 2026
7 checks passed
@abueide abueide deleted the tapi/retry-manager branch March 24, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants