Cost Per Token
The unit price an AI provider charges for processing a single token, quoted per million tokens. Ranges from $0.075/1M for budget models to $75.00/1M for frontier reasoning models — an 1,000x spread.
Definition
What is Cost Per Token?
Impact
Why It Matters for AI Costs
Understanding Cost Per Token
AI providers price their APIs using a per-token model, where tokens are the sub-word units that language models process internally. One token is roughly 4 characters or 0.75 words in English. Providers quote prices per 1 million tokens because individual token prices would be impractically small numbers (fractions of a cent).
Every API call has two token costs: input (the tokens you send) and output (the tokens the model generates). These are always priced separately. The total cost of a single call is:
cost = (input_tokens × input_rate / 1,000,000) + (output_tokens × output_rate / 1,000,000)For example, a GPT-4o call with 2,000 input tokens and 500 output tokens costs:
cost = (2,000 × $2.50 / 1M) + (500 × $10.00 / 1M)
= $0.005 + $0.005
= $0.01Understanding this formula is essential because every optimization technique you apply ultimately reduces one or more of these four variables: input token count, input rate, output token count, or output rate.
Complete Provider Pricing Table
The following table provides comprehensive pricing for all major models as of March 2026. All prices are per 1 million tokens. Cached and batch rates shown where available.
| Provider | Model | Input | Output | Cached Input | Batch Input | Batch Output |
|---|---|---|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 | $1.25 | $1.25 | $5.00 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | $0.075 | $0.075 | $0.30 |
| OpenAI | o1 | $15.00 | $60.00 | $7.50 | $7.50 | $30.00 |
| OpenAI | o3-mini | $1.10 | $4.40 | $0.55 | $0.55 | $2.20 |
| OpenAI | GPT-4.1 | $2.00 | $8.00 | $0.50 | $1.00 | $4.00 |
| OpenAI | GPT-4.1-mini | $0.40 | $1.60 | $0.10 | $0.20 | $0.80 |
| OpenAI | GPT-4.1-nano | $0.10 | $0.40 | $0.025 | $0.05 | $0.20 |
| Anthropic | Claude 3.5 Sonnet | $3.00 | $15.00 | $0.30 | $1.50 | $7.50 |
| Anthropic | Claude 3.5 Haiku | $0.80 | $4.00 | $0.08 | $0.40 | $2.00 |
| Anthropic | Claude 3 Opus | $15.00 | $75.00 | $1.50 | $7.50 | $37.50 |
| Anthropic | Claude Opus 4 | $15.00 | $75.00 | $1.50 | $7.50 | $37.50 |
| Anthropic | Claude Sonnet 4 | $3.00 | $15.00 | $0.30 | $1.50 | $7.50 |
| Gemini 1.5 Pro | $1.25 | $5.00 | $0.3125 | — | — | |
| Gemini 1.5 Flash | $0.075 | $0.30 | $0.01875 | — | — | |
| Gemini 2.0 Flash | $0.10 | $0.40 | $0.025 | — | — | |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.3125 | — | — |
This table reveals several important patterns: (1) within each provider, model tiers span 10-100x in price; (2) cached input rates offer 50-90% discounts; (3) batch processing offers a consistent 50% discount across the board; (4) the cheapest model (Gemini 1.5 Flash at $0.075 input) is 1,000x cheaper than the most expensive (Claude 3 Opus at $75.00 output).
The 1,000x Price Range
The AI model market has an extraordinary price range. The cheapest input rate available (Gemini 1.5 Flash at $0.075/MTok) and the most expensive output rate (Claude 3 Opus at $75.00/MTok) differ by a factor of 1,000. Even within a single provider, the range is significant: OpenAI's GPT-4o-mini input at $0.15/MTok versus o1 output at $60.00/MTok is a 400x spread.
This range exists because models differ dramatically in size, capability, and computational cost:
- Nano/Flash models (Gemini Flash, GPT-4.1-nano): Small, fast models optimized for throughput. They run on fewer GPUs with high batch efficiency. Cost per token: $0.075-$0.40.
- Standard models (GPT-4o, Claude Sonnet): Full-capability models balanced for quality and cost. Cost per token: $2.00-$15.00.
- Reasoning models (o1, o3): Models that perform extended internal reasoning, generating thousands of internal tokens per response. Cost per token: $15.00-$60.00.
- Frontier models (Claude Opus, GPT-4o with high-capability mode): The most capable models available, running on the most expensive hardware configurations. Cost per token: $15.00-$75.00.
The practical implication is that model selection is your highest-leverage cost optimization. Switching from Claude 3 Opus to Claude 3.5 Haiku for a simple task reduces cost by 18x on input and 18x on output — far more than any prompt optimization could achieve.
How Caching and Batching Affect Cost Per Token
Your effective cost per token can be significantly lower than the list price through two mechanisms: prompt caching and batch processing.
Prompt Caching discounts the input tokens that match a previously cached prefix. Anthropic offers a 90% discount: cached input tokens for Claude Sonnet cost $0.30/MTok instead of $3.00/MTok. OpenAI offers a 50% discount: cached GPT-4o input costs $1.25/MTok instead of $2.50/MTok. The discount applies automatically (OpenAI) or when you structure prompts with a stable prefix (Anthropic). For applications with large, repetitive system prompts, caching can reduce the input portion of your bill by 50-90%.
Batch Processing offers a flat 50% discount on both input and output tokens in exchange for higher latency (up to 24 hours). OpenAI's Batch API and Anthropic's Message Batches both offer this rate. GPT-4o batch pricing is $1.25/$5.00 instead of $2.50/$10.00. Claude Sonnet batch pricing is $1.50/$7.50 instead of $3.00/$15.00.
These discounts can be combined in some cases. A batch request with cached input on Anthropic could achieve an effective input rate of $0.15/MTok (90% cache discount on the $1.50 batch input rate) — a 95% discount from the list price of $3.00/MTok. This makes it critical to track your effective cost per token, not just the list price. CostHawk calculates your effective rate by dividing actual spend by actual tokens consumed, accounting for all discounts automatically.
Cost Per Token Trends Over Time
AI model pricing has dropped dramatically and consistently since the launch of GPT-4 in March 2023. The trend line shows approximately a 10x reduction in cost per capability-equivalent token per year. Key milestones:
- March 2023: GPT-4 launched at $30.00 input / $60.00 output per million tokens. This was the state of the art.
- November 2023: GPT-4 Turbo dropped to $10.00 / $30.00 — a 3x reduction in 8 months.
- May 2024: GPT-4o launched at $5.00 / $15.00 — matching GPT-4 quality at 6x lower cost.
- October 2024: GPT-4o pricing reduced to $2.50 / $10.00 — another 2x drop.
- July 2025: GPT-4.1 launched at $2.00 / $8.00 with improved quality — continuing the deflation trend.
This means the same capability that cost $60/MTok in March 2023 costs $8/MTok in 2026 — a 7.5x reduction in under three years. Meanwhile, budget models have gotten even cheaper: GPT-4o-mini offers near-GPT-4-level quality at $0.15/$0.60, which is 200x cheaper than the original GPT-4 output rate.
The implication for cost planning: do not lock in long-term commitments at current prices. Build your architecture to easily swap models, and re-evaluate pricing quarterly. CostHawk's pricing tracker monitors rate changes across all providers and alerts you when a cheaper model becomes available for your workloads.
Choosing Models by Cost Per Token
Selecting the right model for each task based on cost per token is the highest-impact optimization available. Use this decision framework:
- Classification, extraction, simple Q&A: Use the cheapest available model. GPT-4o-mini ($0.15/$0.60) or Gemini 2.0 Flash ($0.10/$0.40) handle these tasks with 95%+ accuracy. Cost per query: $0.0002-$0.003.
- RAG, summarization, general chat: Use a mid-tier model. GPT-4o ($2.50/$10.00) or Claude 3.5 Sonnet ($3.00/$15.00) provide the best quality-to-cost ratio for these workloads. Cost per query: $0.005-$0.03.
- Complex reasoning, math, code review: Use reasoning models when accuracy is critical. o3-mini ($1.10/$4.40) is often sufficient. Reserve o1 ($15.00/$60.00) for the hardest problems. Cost per query: $0.05-$2.00.
- Batch and offline processing: Always use batch pricing for non-real-time workloads. A nightly summarization job on GPT-4o batch ($1.25/$5.00) costs half of real-time, with no quality difference.
The key insight: most production applications should use 2-3 models, not one. Route traffic dynamically based on task complexity and latency requirements. CostHawk's model comparison dashboard shows you exactly how much each model costs for your specific workload patterns, making it easy to identify routing opportunities.
FAQ
Frequently Asked Questions
Why are prices quoted per million tokens instead of per token?+
What is the cheapest AI model available right now?+
How much does it cost to process 1 million tokens?+
How do cached input tokens affect my effective cost per token?+
Is cost per token the same across all languages?+
How does the Batch API reduce cost per token?+
How fast are AI model prices dropping?+
What is the difference between list price and effective price per token?+
Should I always use the cheapest model?+
How do I forecast my AI costs using cost per token?+
Monthly cost = daily_requests × avg_input_tokens × input_rate + daily_requests × avg_output_tokens × output_rate × 30. For example: 10,000 daily requests averaging 2,000 input tokens and 500 output tokens on GPT-4o: (10,000 × 2,000 × $2.50/1M + 10,000 × 500 × $10.00/1M) × 30 = ($50 + $50) × 30 = $3,000/month. But account for growth: if request volume grows 20% month-over-month, your month-6 cost is $3,000 × 1.2^5 = $7,465. Also account for context growth in conversational apps, where input tokens per request increase over time as conversation histories lengthen. CostHawk provides forecasting tools that project costs based on your actual usage trends, not just static estimates.Related Terms
Token
The fundamental billing unit for large language models. Every API call is metered in tokens, which are sub-word text fragments produced by BPE tokenization. One token averages roughly four characters in English, and providers bill input and output tokens at separate rates.
Read moreToken Pricing
The per-token cost model used by AI API providers, with separate rates for input tokens, output tokens, and cached tokens. Token pricing is the fundamental billing mechanism for LLM APIs, typically quoted per million tokens, and varies by model, provider, and usage tier.
Read moreInput vs. Output Tokens
The two token directions in every LLM API call, each priced differently. Output tokens cost 3-5x more than input tokens across all major providers.
Read morePrompt Caching
A provider-side optimization that caches repeated prompt prefixes to reduce input token costs by 50-90% on subsequent requests.
Read moreBatch API
Asynchronous API endpoints that process large volumes of LLM requests at a 50% discount in exchange for longer turnaround times.
Read moreModel Routing
Dynamically directing AI requests to different models based on task complexity, cost constraints, and quality requirements to achieve optimal cost efficiency.
Read moreAI Cost Glossary
Put this knowledge to work. Track your AI spend in one place.
CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.
