Skip to content

BadRequestError: 400 - max_tokens or model output limit reached when using beta.chat.completions.parse() with Azure OpenAI GPT-5, even without setting max_tokens #2886

@Kevv-J

Description

@Kevv-J

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

When calling client.beta.chat.completions.parse() for structured output on an Azure-hosted GPT-5 deployment, the API returns a 400 BadRequestError:

openai.BadRequestError: Error code: 400 - {                                                                                                                                  
  'error': {
    'message': 'Could not finish the message because max_tokens or model output limit was reached. Please try again with higher max_tokens.',
    'type': 'invalid_request_error',
    'param': None,
    'code': None
  }
}

This occurs even though neither max_tokens nor max_completion_tokens is set anywhere in the request.

This is related to #2046, which was closed with the suggestion to use max_completion_tokens instead of max_tokens. That resolution does not apply here, this issue occurs when no token limit parameter is passed at all. The error message is therefore misleading: it implies the caller set a limit that was too low, when in fact no limit was set.

Expected Behavior

No token limit is enforced when neither max_tokens nor max_completion_tokens is provided, consistent with how chat.completions.create() behaves.

Actual Behavior

A 400 BadRequestError is raised claiming the model output limit was reached, despite no limit being set by the caller.

To Reproduce

  1. Create an AsyncAzureOpenAI client pointing to a GPT-5 Azure deployment
  2. Call client.beta.chat.completions.parse() with a Pydantic model as response_format
  3. Pass reasoning_effort but do not pass max_tokens or max_completion_tokens
  4. Use a moderately complex Pydantic schema (e.g. nested models)

Code snippets

from openai import AsyncAzureOpenAI

client = AsyncAzureOpenAI(
    azure_endpoint="<AZURE_GPT5_ENDPOINT>",
    azure_deployment="<AZURE_GPT5_DEPLOYMENT>",
    api_version="<API_VERSION>",
    api_key="<API_KEY>",
)

completion = await client.beta.chat.completions.parse(
    model="<model_name>",
    messages=[
        {"role": "developer", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    response_format=MyPydanticOutputClass,  # Pydantic model for structured output
    reasoning_effort="minimal",
    # max_tokens and max_completion_tokens are intentionally NOT set
)

result = completion.choices[0].message.parsed

OS

Ubuntu 24.04.2 LTS

Python version

3.11.13

Library version

openai 1.75.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions