Fix: proactive auth refresh to reload guarded disk state first#15357
Fix: proactive auth refresh to reload guarded disk state first#15357
Conversation
c010668 to
41a653f
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41a653fd81
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if Self::is_stale_for_proactive_refresh(&auth) | ||
| && let Err(err) = self.refresh_token().await | ||
| { |
There was a problem hiding this comment.
Keep proactive refresh working when account ID is missing
auth() now routes stale ChatGPT auth through refresh_token(), but that path hard-fails when expected_account_id is None. Auth files can legitimately lack account_id (it is optional), so proactive refresh now returns stale cached auth instead of refreshing/reloading. This can leave expired tokens in use and cause repeated auth failures/log spam for those users.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@pakrym you added this logic to skip reloading when there's no account_id: https://github.com/openai/codex/pull/8880/changes#diff-c618f196cd6214e554ed7c8845bdfefb3cd013c58274b10f3afe5d792e7d0af3R725-R727
is this expected behavior?
Summary
Fix a managed ChatGPT auth bug where a stale Codex process could proactively refresh using an old in-memory refresh token even after another process had already rotated auth on disk.
This changes the proactive
AuthManager::auth()path to reuse the existing guardedrefresh_token()flow instead of calling the refresh endpoint directly from cached auth state.Original Issue
Users reported repeated
codexdlog lines like:In practice this showed up most often when multiple
codexdprocesses were left running. Killing the extra processes stopped the noise, which suggested the issue was caused by stale auth state across processes rather than invalid user credentials.Diagnosis
The bug was in the proactive refresh path used by
AuthManager::auth():R0toR1, and persist the updated auth state pluslast_refreshto disk.R0and the oldlast_refresh.auth(), it checked staleness from its cached in-memory auth instead of first reloading from disk.last_refreshwas stale, Process B would proactively call/oauth/tokenwith stale refresh tokenR0.auth()logged the refresh error but kept returning the same stale cached auth, so repeatedauth()calls could keep retrying with dead state.This differed from the existing unauthorized-recovery flow, which already did the safer thing: guarded reload from disk first, then refresh only if the on-disk auth was unchanged.
What Changed
AuthManager::auth()to:refresh_token()when staleWhy This Fix
refresh_token()already contains the right cross-process safety behavior:Reusing that path makes proactive refresh consistent with unauthorized recovery and prevents stale processes from trying to refresh already-rotated tokens.
Testing
Test shape:
CODEX_HOMEfrom~/.codex/auth.jsonlast_refreshto an old timestamp so proactive refresh is requiredBfirst so it caches stale auth and sleepsAsecond so it refreshes first/oauth/tokenserverBmakes a second refresh request with the stale in-memory token, or reloads the rotated token from diskBefore the fix
The repro showed the bug clearly: the mock server saw two refreshes with the same stale token,
Arotated to a new token, andBstill returned the stale token instead of reloading from disk.After the fix
After the fix, the mock server saw only one refresh request.
Arefreshed once, andBstarted with the stale token but reloaded and returned the rotated token.This shows the new behavior:
Arefreshes once, thenBreuses the updated auth from disk instead of making a second refresh request with the stale token.