Skip to content

feat(inference): allow setting custom inference timeout#672

Open
pentschev wants to merge 4 commits intoNVIDIA:mainfrom
pentschev:inference-timeout
Open

feat(inference): allow setting custom inference timeout#672
pentschev wants to merge 4 commits intoNVIDIA:mainfrom
pentschev:inference-timeout

Conversation

@pentschev
Copy link
Copy Markdown

@pentschev pentschev commented Mar 30, 2026

Summary

Makes the inference routing timeout configurable via openshell inference set --timeout <secs> and openshell inference update --timeout <secs>, replacing the hardcoded 60-second default. Timeout changes propagate dynamically to running sandboxes within the route refresh interval (~5 seconds) without requiring sandbox recreation.

The timeout was observed running OpenCode for a complex build task on a DGX Spark running nemotron-3-super:120b via Ollama, this feature allows longer running tasks to succeed.

Related Issue

Closes #641

Changes

  • Add timeout_secs field to ClusterInferenceConfig, SetClusterInferenceRequest, SetClusterInferenceResponse, GetClusterInferenceResponse, and ResolvedRoute proto messages
  • Add timeout field (Duration) to the router's ResolvedRoute struct with a DEFAULT_ROUTE_TIMEOUT of 60 seconds
  • Remove the global reqwest::Client timeout; apply per-request .timeout(route.timeout) in backend.rs
  • Thread timeout_secs through server persistence (upsert_cluster_inference_route, build_cluster_inference_config, bundle resolution)
  • Map proto timeout_secs to router ResolvedRoute.timeout in the sandbox's bundle_to_resolved_routes()
  • Include timeout_secs in the bundle revision hash so timeout changes trigger route cache refreshes in running sandboxes
  • Add --timeout CLI flag to inference set (default 0 = 60s) and inference update (optional)
  • Update docs/inference/configure.md with timeout usage and hot-reload behavior
  • Update architecture/inference-routing.md with per-request timeout semantics, proto field additions, and CLI surface

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@pentschev pentschev requested a review from a team as a code owner March 30, 2026 07:13
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@pentschev
Copy link
Copy Markdown
Author

I have read the DCO document and I hereby sign the DCO.

@pentschev
Copy link
Copy Markdown
Author

recheck

@johntmyers johntmyers self-assigned this Mar 30, 2026
@johntmyers
Copy link
Copy Markdown
Collaborator

Hi thank you. Please address the failing branch checks. AGENTS.md describes this:

## Pre-commit

- Run `mise run pre-commit` before committing.
- Install the git hook when working locally: `mise generate git-pre-commit --write --task=pre-commit`

@pentschev
Copy link
Copy Markdown
Author

Sorry, that was my mistake, should be fixed now. Could you check again @johntmyers ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(proxy): 60s reqwest total timeout kills streaming inference responses mid-generation

2 participants