Scale Tier for API Customers

This offering is available to Enterprise customers. Please contact our sales team⁠ to learn more. To access the same premium latency and reliability benefits on a flexible, pay-as-you-go basis, see Priority processing⁠.

Scale Tier lets you purchase a set number of API input and output tokens per minute (known as “token units”) upfront for access to one dedicated model snapshot. Each token unit is purchased for a minimum of 30 days. Additional models may be added based on customer interest.

By choosing Scale Tier, you can unlock:

Predictable latency: Scale Tier is designed to generate tokens faster and at a more consistent speed than the pay-as-you-go (PAYG) service, even during peak demand.
Uncapped scale: Any quota purchases with Scale Tier is automatically added to your rate limits, so you can confidently scale further.
Higher reliability: Scale Tier traffic offers a 99.9% uptime SLA and prioritized compute.

	Input bundle	Output bundle	Uptime SLA	Latency SLA
GPT-5.2	25,000 TPM $105.00 per unit/day	2,500 TPM $84.00 per unit/day	99.9%	99% > 50 tokens per second²
GPT-5.1	25,000 TPM $75.00 per unit/day	2,500 TPM $60.00 per unit/day	99.9%	99% > 50 tokens per second²
GPT-5	25,000 TPM $75.00 per unit/day	2,500 TPM $60.00 per unit/day	99.9%	99% > 50 tokens per second²
GPT-5 mini	500,000 TPM $275.00 per unit/day	50,000 TPM $220.00 per unit/day	99.9%	99% > 80 tokens per second²
GPT-4.1 excludes long-context¹	30,000 TPM $110.00 per unit/day	2,500 TPM $36.00 per unit/day	99.9%	99% > 80 tokens per second²
GPT-4.1 mini excludes long-context¹	500,000 TPM $450.00 per unit/day	50,000 TPM $175.00 per unit/day	99.9%	99% > 90 tokens per second²
GPT-4.1 nano excludes long-context¹	500,000 TPM $110.00 per unit/day	50,000 TPM $40.00 per unit/day	99.9%	99% > 100 tokens per second²
GPT-4.1 fine tuning	30,000 TPM $165.00 per unit/day	2,500 TPM $36.00 per unit/day	99.9%	99% > 80 tokens per second²
GPT-4.1 mini fine tuning	500,000 TPM $900.00 per unit/day	50,000 TPM $175.00 per unit/day	99.9%	99% > 90 tokens per second²
o3	25,000 TPM $75.00 per unit/day	5,000 TPM $60.00 per unit/day	99.9%	99% > 80 tokens per second²
o4-mini	30,000 TPM $50.00 per unit/day	5,000 TPM $32.50 per unit/day	99.9%	99% > 90 tokens per second²
GPT-4o	30,000 TPM $124.59 per unit/day	2,500 TPM $39.34 per unit/day	99.9%	99% > 80 tokens per second²
GPT-4o mini	500,000 TPM $114.75 per unit/day	50,000 TPM $49.18 per unit/day	99.9%	99% > 90 tokens per second²
GPT-4o mini fine tuning	500,000 TPM $229.50 per unit/day	50,000 TPM $98.36 per unit/day	99.9%	99% > 90 tokens per second²
o1	5,000 TPM $163.93 per unit/day	1,000 TPM $131.15 per unit/day	99.9%	99% > 80 tokens per second²
o3-mini	30,000 TPM $78.69 per unit/day	5,000 TPM $52.46 per unit/day	99.9%	99% > 90 tokens per second²

1Requests estimated at >128K prompt tokens

2Calculated as p50 request latency on a per 5 minute basis. For customers with existing enterprise agreements that have latency SLAs calculated as p50 request latency on a per minute basis, the prior SLAs are also still applicable.

How it works

With Scale Tier, you can purchase input and output token units. For example, with GPT‑4.1 each input unit costs $110/day and entitles you to 30k input tokens/min. Each output unit costs $36/day and entitles you to 2.5k output tokens/min. Each token unit is purchased for a minimum of 30 days.

More information about how Scale Tier interacts with Prompt Caching can be found in the FAQ section below.

Pricing

Once you’ve signed an order form, you can add and remove token units through your developer console⁠(opens in a new window).

For billing purposes, tokens per minute (TPM) are calculated by averaging the number of tokens used in 15-minute intervals aligned to the top of the hour (e.g. 3:00 to < 3:15, 3:15 to < 3:30, etc). If the total tokens used within a 15-minute period is below your Scale Tier entitlement, they are not billed. For example, if you purchase Scale Tier for GPT‑4o with an entitlement of 30,000 input tokens per minute, you can use up to 450,000 input tokens in any 15-minute period without incurring additional charges. Any tokens used beyond this limit are billed at pay-as-you-go (PAYG) rates.

Token units and rate limits

Once Scale Tier is enabled for your account, you can manually adjust your token units.

You can see your current rate limits in your settings page⁠⁠(opens in a new window). When you purchase token units for Scale Tier, your rate limits for that model will automatically increase by the amount of your purchase. When you use the model, requests will first be processed using your faster Scale Tier quota. If you exceed your quota, additional requests will be processed using the regular Standard processing service. If you exceed your total rate limit in a minute across Scale Tier + regular Standard processing limits, then further requests will be rejected like normal with a 429 error code.

We provide different discounts on cached input tokens (50%, 75%, or 90%) depending on the model. If you send 50,000 TPM in cached input tokens on a model where cached tokens are discounted 50%, those tokens only count for 25,000 TPM against your quota. If you send 50,000 TPM in cached input tokens on a model where cached tokens are discounted 75%, those tokens only count for 12,500 TPM against your quota. Learn more about Prompt Caching ↗⁠(opens in a new window)

Models

Scale Tier supports the same multimodal capabilities available on Standard processing. In particular, images can be used as inputs to Scale Tier and are processed with the same fast latency.

Reliability

You will be credited with the greater of the two SLA amounts for the calendar month of that Scale Tier token unit purchase.

Policies

If customers have a use case that qualifies for ZDR, then their Scale Tier usage will adhere to that same retention policy.

Scale Tier for API Customers

How it works

Pricing

How is Scale Tier ordered and provisioned?

When does billing start?

How are pay-as-you-go overages calculated while I’m using Scale Tier?

If I make an annual commitment, does my spend have to be applied to Scale Tier?

Is my annual commitment tied to a specific offering?

If I’m already using Reserved Capacity, how can I use Scale Tier for GPT-4o?

Token units and rate limits

How can I purchase token units on Scale Tier?

How can I tell my TPM?

How do I figure out my total rate limits?

Can I choose which requests are covered with Scale Tier?

How does Scale Tier work with Prompt Caching?

Models

How do other modalities work with Scale Tier?

Does Scale Tier support fine-tuning?

Can I automatically send my Scale Tier spill-over traffic to Priority processing?

Reliability

What happens if the latency and uptime SLA are both violated?

Policies

How does Zero Data Retention (ZDR) work for Scale Tier?