BadRequestError: 400 - max_tokens or model output limit reached when using beta.chat.completions.parse() with Azure OpenAI GPT-5, even without setting max_tokens

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [x] This is an issue with the Python library

### Describe the bug

When calling `client.beta.chat.completions.parse()` for structured output on an Azure-hosted GPT-5 deployment, the API returns a `400 BadRequestError`:
                                                                                                                                                                               
  ```                                                                                                                                                                          
  openai.BadRequestError: Error code: 400 - {                                                                                                                                  
    'error': {
      'message': 'Could not finish the message because max_tokens or model output limit was reached. Please try again with higher max_tokens.',
      'type': 'invalid_request_error',
      'param': None,
      'code': None
    }
  }
  ```

  This occurs even though **neither `max_tokens` nor `max_completion_tokens` is set anywhere in the request**.

  This is related to #2046, which was closed with the suggestion to use `max_completion_tokens` instead of `max_tokens`. That resolution does not apply here, this issue occurs when **no token limit parameter is passed at all**. The error message is therefore misleading: it implies the caller set a limit that was too low, when in fact no limit was set.

## Expected Behavior

  No token limit is enforced when neither `max_tokens` nor `max_completion_tokens` is provided, consistent with how `chat.completions.create()` behaves.

  ## Actual Behavior

  A `400 BadRequestError` is raised claiming the model output limit was reached, despite no limit being set by the caller.

### To Reproduce

1. Create an `AsyncAzureOpenAI` client pointing to a GPT-5 Azure deployment
2. Call `client.beta.chat.completions.parse()` with a Pydantic model as `response_format`
3. Pass `reasoning_effort` but **do not pass `max_tokens` or `max_completion_tokens`**
4. Use a moderately complex Pydantic schema (e.g. nested models)

### Code snippets

```Python
from openai import AsyncAzureOpenAI

client = AsyncAzureOpenAI(
    azure_endpoint="<AZURE_GPT5_ENDPOINT>",
    azure_deployment="<AZURE_GPT5_DEPLOYMENT>",
    api_version="<API_VERSION>",
    api_key="<API_KEY>",
)

completion = await client.beta.chat.completions.parse(
    model="<model_name>",
    messages=[
        {"role": "developer", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    response_format=MyPydanticOutputClass,  # Pydantic model for structured output
    reasoning_effort="minimal",
    # max_tokens and max_completion_tokens are intentionally NOT set
)

result = completion.choices[0].message.parsed
```

### OS

Ubuntu 24.04.2 LTS

### Python version

3.11.13

### Library version

openai 1.75.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BadRequestError: 400 - max_tokens or model output limit reached when using beta.chat.completions.parse() with Azure OpenAI GPT-5, even without setting max_tokens #2886

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Expected Behavior

Actual Behavior

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BadRequestError: 400 - max_tokens or model output limit reached when using beta.chat.completions.parse() with Azure OpenAI GPT-5, even without setting max_tokens #2886

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Expected Behavior

Actual Behavior

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions