Skip to main content
OpenAI

Scale Tier for API Customers

This offering is available to Enterprise customers. Please contact our sales team⁠ to learn more. To access the same premium latency and reliability benefits on a flexible, pay-as-you-go basis, see Priority processing.

Scale Tier lets you purchase a set number of API input and output tokens per minute (known as “token units”) upfront for access to one dedicated model snapshot. Each token unit is purchased for a minimum of 30 days. Additional models may be added based on customer interest.

By choosing Scale Tier, you can unlock:

  • Predictable latency: Scale Tier is designed to generate tokens faster and at a more consistent speed than the pay-as-you-go (PAYG) service, even during peak demand.
  • Uncapped scale: Any quota purchases with Scale Tier is automatically added to your rate limits, so you can confidently scale further. 
  • Higher reliability: Scale Tier traffic offers a 99.9% uptime SLA and prioritized compute.
Input bundleOutput bundleUptime SLALatency SLA
GPT-525,000 TPM
$75.00 per unit/day
2,500 TPM
$60.00 per unit/day
99.9%99% > 50 tokens per second2
GPT-5 mini500,000 TPM
$275.00 per unit/day
50,000 TPM
$220.00 per unit/day
99.9%99% > 80 tokens per second2
GPT-4.1
excludes long-context1
30,000 TPM
$110.00 per unit/day
2,500 TPM
$36.00 per unit/day
99.9%99% > 80 tokens per second2
GPT-4.1 mini
excludes long-context1
500,000 TPM
$450.00 per unit/day
50,000 TPM
$175.00 per unit/day
99.9%99% > 90 tokens per second2
GPT-4.1 nano
excludes long-context1
500,000 TPM
$110.00 per unit/day
50,000 TPM
$40.00 per unit/day
99.9%99% > 100 tokens per second2
o325,000 TPM
$75.00 per unit/day
5,000 TPM
$60.00 per unit/day
99.9%99% > 80 tokens per second2
o4-mini30,000 TPM
$50.00 per unit/day
5,000 TPM
$32.50 per unit/day
99.9%99% > 90 tokens per second2
GPT-4o30,000 TPM
$124.59 per unit/day
2,500 TPM
$39.34 per unit/day
99.9%99% > 80 tokens per second2
GPT-4o mini500,000 TPM
$114.75 per unit/day
50,000 TPM
$49.18 per unit/day
99.9%99% > 90 tokens per second2
GPT-4o mini fine tuning500,000 TPM
$229.50 per unit/day
50,000 TPM
$98.36 per unit/day
99.9%99% > 90 tokens per second2
o15,000 TPM
$163.93 per unit/day
1,000 TPM
$131.15 per unit/day
99.9%99% > 80 tokens per second2
o3-mini30,000 TPM
$78.69 per unit/day
5,000 TPM
$52.46 per unit/day
99.9%99% > 90 tokens per second2
1Requests estimated at >128K prompt tokens
2Calculated as p50 request latency on a per 5 minute basis. For customers with existing enterprise agreements that have latency SLAs calculated as p50 request latency on a per minute basis, the prior SLAs are also still applicable.

How it works

With Scale Tier, you can purchase input and output token units. For example, with GPT‑4.1 each input unit costs $110/day and entitles you to 30k input tokens/min. Each output unit costs $36/day and entitles you to 2.5k output tokens/min. Each token unit is purchased for a minimum of 30 days.

More information about how Scale Tier interacts with Prompt Caching can be found in the FAQ section below.

Tokens per minuteTPMPay for tokens used19 tok/s99.5%Pay-as-you-goAvg latencyUptimeTierInputOutputBeforePaid upfront monthly19 tok/s25 tok/s99.5%99.9%Pay-as-you-goScaleAvg latencyUptimeTierInput3 unitsOutput2 unitsAfter

Pricing

Once you’ve signed an order form, you can add and remove token units through your developer console(opens in a new window).

Token units and rate limits

Once Scale Tier is enabled for your account, you can manually adjust your token units.

Models

Scale Tier supports the same multimodal capabilities available on Standard processing. In particular, images can be used as inputs to Scale Tier and are processed with the same fast latency.

Reliability

You will be credited with the greater of the two SLA amounts for the calendar month of that Scale Tier token unit purchase.

Policies

If customers have a use case that qualifies for ZDR, then their Scale Tier usage will adhere to that same retention policy.