OpenAI Introduces Flex Processing for Cheaper API Requests

openai

OpenAI launches Flex processing as a cheaper alternative for API requests. The formula is intended for less time-critical applications.

OpenAI introduces Flex processing for API users. With Flex, the AI company aims to offer a competitive response to the pricing strategies of competitors such as Google.

Flex processing targets developers who want to limit their costs for non-production tasks such as model testing, data enrichment, or asynchronous processing. In exchange for lower rates, the user accepts slower response times and the possibility that the requested resources may be temporarily unavailable.

Flex processing uses the same rates as OpenAI’s Batch API. To activate the mode, users must set the service_tier parameter to flex in their API request. This applies to both the Chat and Responses functionality of the API.

When using Flex processing, longer processing times are likely. The default timeout is ten minutes. For complex or lengthy tasks, OpenAI recommends increasing this timeout. In the Python and JavaScript SDKs, this is done via the timeout parameter. If a request takes longer than allowed, the SDKs automatically try to execute it twice more before giving an error message.

Delay or Unavailability

A request via Flex processing may sometimes be refused when insufficient processing capacity is available. In that case, the user receives a 429 error code, but no costs are charged.

To handle this, OpenAI recommends two strategies. The first is to retry the request with an increasing wait time between attempts (exponential backoff). This strategy is suitable for applications that can tolerate small delays.

As a second option, the request can optionally fall back to the standard processing level. For this, developers should set the service_tier parameter to auto or completely omit it from the request. In that case, a request will sometimes be processed at a slightly higher rate.

The Flex option is currently only available in beta and can only be used by users of OpenAI’s o3 and o4 mini models.