Skip to content

feat(gemini): add service_tier support for Flex Inference#1002

Open
smittie2000 wants to merge 1 commit intoprism-php:mainfrom
smittie2000:feature/gemini-flex-inference
Open

feat(gemini): add service_tier support for Flex Inference#1002
smittie2000 wants to merge 1 commit intoprism-php:mainfrom
smittie2000:feature/gemini-flex-inference

Conversation

@smittie2000
Copy link
Copy Markdown

@smittie2000 smittie2000 commented Apr 3, 2026

Description

Add serviceTier provider option to the Gemini provider's Text, Stream, and Structured handlers, enabling Google's Flex Inference API.

Flex Inference provides 50% cost reduction for latency-tolerant workloads (batch processing, background summarization, etc.) by opting into Google's flex service tier.

Usage

Prism::text()
    ->using(Provider::Gemini, 'gemini-2.5-flash')
    ->withPrompt('Summarize this document...')
    ->withProviderOptions(['serviceTier' => 'flex'])
    ->asText();

Note: The Gemini API requires lowercase 'flex' — uppercase 'FLEX' is rejected despite appearing in some Google documentation examples. Also, Flex Inference has a latency target of 1-15 minutes, so users should increase their timeout accordingly.

Implementation

  • Follows the same pattern as the existing OpenAI service_tier implementation
  • Passes service_tier at the top level of the Gemini REST request body (not inside generationConfig)
  • Uses Arr::whereNotNull — when serviceTier is not set, the key is omitted entirely (zero impact on existing requests)
  • Error handling already covered: Gemini.php handles 429 (PrismRateLimitedException) and 503 (PrismProviderOverloadedException)

Provider option key naming

Uses serviceTier (camelCase) to follow the Gemini provider's existing convention (thinkingConfig, thinkingBudget, cachedContentName, safetySettings).

Related

Breaking Changes

None

Add serviceTier provider option to Gemini Text, Stream, and Structured
handlers, enabling Google's Flex Inference API. This provides 50% cost
reduction for latency-tolerant workloads.

Follows the same pattern as the existing OpenAI service_tier implementation.
Passes service_tier at the top level of the Gemini REST request body.

Ref: https://ai.google.dev/gemini-api/docs/flex-inference
@smittie2000 smittie2000 force-pushed the feature/gemini-flex-inference branch from eaae6b9 to 026bf90 Compare April 3, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant