GlossaryBilling & PricingUpdated 2026-03-16

Cost Per Token

The unit price an AI provider charges for processing a single token, quoted per million tokens. Ranges from $0.075/1M for budget models to $75.00/1M for frontier reasoning models — an 1,000x spread.

Definition

What is Cost Per Token?

Cost per token is the fundamental unit price in AI API billing. Every provider quotes prices as a dollar amount per 1 million tokens (abbreviated $/1M or $/MTok). Input and output tokens carry separate rates, with output rates typically 4-5x higher. The full pricing landscape spans from $0.075 per million input tokens (Gemini 1.5 Flash) to $75.00 per million output tokens (Claude 3 Opus) — a range exceeding 1,000x. This enormous spread means model selection alone can change your costs by orders of magnitude, even before any other optimization is applied.

Impact

Why It Matters for AI Costs

Cost per token is the atomic unit of your AI bill. Every optimization strategy — prompt compression, caching, model routing, batch processing — ultimately works by either reducing the number of tokens consumed or reducing the effective price per token. A team processing 100 million tokens per month sees a $100 monthly impact for every $1/MTok change in their effective rate. CostHawk normalizes all provider pricing to a consistent per-million-token basis and tracks your effective cost per token after caching discounts, batch savings, and model routing are applied.

Understanding Cost Per Token

AI providers price their APIs using a per-token model, where tokens are the sub-word units that language models process internally. One token is roughly 4 characters or 0.75 words in English. Providers quote prices per 1 million tokens because individual token prices would be impractically small numbers (fractions of a cent).

Every API call has two token costs: input (the tokens you send) and output (the tokens the model generates). These are always priced separately. The total cost of a single call is:

cost = (input_tokens × input_rate / 1,000,000) + (output_tokens × output_rate / 1,000,000)

For example, a GPT-4o call with 2,000 input tokens and 500 output tokens costs:

cost = (2,000 × $2.50 / 1M) + (500 × $10.00 / 1M)
     = $0.005 + $0.005
     = $0.01

Understanding this formula is essential because every optimization technique you apply ultimately reduces one or more of these four variables: input token count, input rate, output token count, or output rate.

Complete Provider Pricing Table

The following table provides comprehensive pricing for all major models as of March 2026. All prices are per 1 million tokens. Cached and batch rates shown where available.

ProviderModelInputOutputCached InputBatch InputBatch Output
OpenAIGPT-4o$2.50$10.00$1.25$1.25$5.00
OpenAIGPT-4o-mini$0.15$0.60$0.075$0.075$0.30
OpenAIo1$15.00$60.00$7.50$7.50$30.00
OpenAIo3-mini$1.10$4.40$0.55$0.55$2.20
OpenAIGPT-4.1$2.00$8.00$0.50$1.00$4.00
OpenAIGPT-4.1-mini$0.40$1.60$0.10$0.20$0.80
OpenAIGPT-4.1-nano$0.10$0.40$0.025$0.05$0.20
AnthropicClaude 3.5 Sonnet$3.00$15.00$0.30$1.50$7.50
AnthropicClaude 3.5 Haiku$0.80$4.00$0.08$0.40$2.00
AnthropicClaude 3 Opus$15.00$75.00$1.50$7.50$37.50
AnthropicClaude Opus 4$15.00$75.00$1.50$7.50$37.50
AnthropicClaude Sonnet 4$3.00$15.00$0.30$1.50$7.50
GoogleGemini 1.5 Pro$1.25$5.00$0.3125
GoogleGemini 1.5 Flash$0.075$0.30$0.01875
GoogleGemini 2.0 Flash$0.10$0.40$0.025
GoogleGemini 2.5 Pro$1.25$10.00$0.3125

This table reveals several important patterns: (1) within each provider, model tiers span 10-100x in price; (2) cached input rates offer 50-90% discounts; (3) batch processing offers a consistent 50% discount across the board; (4) the cheapest model (Gemini 1.5 Flash at $0.075 input) is 1,000x cheaper than the most expensive (Claude 3 Opus at $75.00 output).

The 1,000x Price Range

The AI model market has an extraordinary price range. The cheapest input rate available (Gemini 1.5 Flash at $0.075/MTok) and the most expensive output rate (Claude 3 Opus at $75.00/MTok) differ by a factor of 1,000. Even within a single provider, the range is significant: OpenAI's GPT-4o-mini input at $0.15/MTok versus o1 output at $60.00/MTok is a 400x spread.

This range exists because models differ dramatically in size, capability, and computational cost:

  • Nano/Flash models (Gemini Flash, GPT-4.1-nano): Small, fast models optimized for throughput. They run on fewer GPUs with high batch efficiency. Cost per token: $0.075-$0.40.
  • Standard models (GPT-4o, Claude Sonnet): Full-capability models balanced for quality and cost. Cost per token: $2.00-$15.00.
  • Reasoning models (o1, o3): Models that perform extended internal reasoning, generating thousands of internal tokens per response. Cost per token: $15.00-$60.00.
  • Frontier models (Claude Opus, GPT-4o with high-capability mode): The most capable models available, running on the most expensive hardware configurations. Cost per token: $15.00-$75.00.

The practical implication is that model selection is your highest-leverage cost optimization. Switching from Claude 3 Opus to Claude 3.5 Haiku for a simple task reduces cost by 18x on input and 18x on output — far more than any prompt optimization could achieve.

How Caching and Batching Affect Cost Per Token

Your effective cost per token can be significantly lower than the list price through two mechanisms: prompt caching and batch processing.

Prompt Caching discounts the input tokens that match a previously cached prefix. Anthropic offers a 90% discount: cached input tokens for Claude Sonnet cost $0.30/MTok instead of $3.00/MTok. OpenAI offers a 50% discount: cached GPT-4o input costs $1.25/MTok instead of $2.50/MTok. The discount applies automatically (OpenAI) or when you structure prompts with a stable prefix (Anthropic). For applications with large, repetitive system prompts, caching can reduce the input portion of your bill by 50-90%.

Batch Processing offers a flat 50% discount on both input and output tokens in exchange for higher latency (up to 24 hours). OpenAI's Batch API and Anthropic's Message Batches both offer this rate. GPT-4o batch pricing is $1.25/$5.00 instead of $2.50/$10.00. Claude Sonnet batch pricing is $1.50/$7.50 instead of $3.00/$15.00.

These discounts can be combined in some cases. A batch request with cached input on Anthropic could achieve an effective input rate of $0.15/MTok (90% cache discount on the $1.50 batch input rate) — a 95% discount from the list price of $3.00/MTok. This makes it critical to track your effective cost per token, not just the list price. CostHawk calculates your effective rate by dividing actual spend by actual tokens consumed, accounting for all discounts automatically.

Cost Per Token Trends Over Time

AI model pricing has dropped dramatically and consistently since the launch of GPT-4 in March 2023. The trend line shows approximately a 10x reduction in cost per capability-equivalent token per year. Key milestones:

  • March 2023: GPT-4 launched at $30.00 input / $60.00 output per million tokens. This was the state of the art.
  • November 2023: GPT-4 Turbo dropped to $10.00 / $30.00 — a 3x reduction in 8 months.
  • May 2024: GPT-4o launched at $5.00 / $15.00 — matching GPT-4 quality at 6x lower cost.
  • October 2024: GPT-4o pricing reduced to $2.50 / $10.00 — another 2x drop.
  • July 2025: GPT-4.1 launched at $2.00 / $8.00 with improved quality — continuing the deflation trend.

This means the same capability that cost $60/MTok in March 2023 costs $8/MTok in 2026 — a 7.5x reduction in under three years. Meanwhile, budget models have gotten even cheaper: GPT-4o-mini offers near-GPT-4-level quality at $0.15/$0.60, which is 200x cheaper than the original GPT-4 output rate.

The implication for cost planning: do not lock in long-term commitments at current prices. Build your architecture to easily swap models, and re-evaluate pricing quarterly. CostHawk's pricing tracker monitors rate changes across all providers and alerts you when a cheaper model becomes available for your workloads.

Choosing Models by Cost Per Token

Selecting the right model for each task based on cost per token is the highest-impact optimization available. Use this decision framework:

  • Classification, extraction, simple Q&A: Use the cheapest available model. GPT-4o-mini ($0.15/$0.60) or Gemini 2.0 Flash ($0.10/$0.40) handle these tasks with 95%+ accuracy. Cost per query: $0.0002-$0.003.
  • RAG, summarization, general chat: Use a mid-tier model. GPT-4o ($2.50/$10.00) or Claude 3.5 Sonnet ($3.00/$15.00) provide the best quality-to-cost ratio for these workloads. Cost per query: $0.005-$0.03.
  • Complex reasoning, math, code review: Use reasoning models when accuracy is critical. o3-mini ($1.10/$4.40) is often sufficient. Reserve o1 ($15.00/$60.00) for the hardest problems. Cost per query: $0.05-$2.00.
  • Batch and offline processing: Always use batch pricing for non-real-time workloads. A nightly summarization job on GPT-4o batch ($1.25/$5.00) costs half of real-time, with no quality difference.

The key insight: most production applications should use 2-3 models, not one. Route traffic dynamically based on task complexity and latency requirements. CostHawk's model comparison dashboard shows you exactly how much each model costs for your specific workload patterns, making it easy to identify routing opportunities.

FAQ

Frequently Asked Questions

Why are prices quoted per million tokens instead of per token?+
Individual token prices are impractically small numbers. GPT-4o input costs $0.0000025 per token — a number that is hard to read, compare, and calculate with. Quoting per million tokens ($2.50/MTok) makes prices human-readable and comparable. It also aligns with how usage is measured at scale: a production application processing 50 million tokens per month thinks in millions, not individual tokens. Some older documentation quotes per 1,000 tokens (1K), which was standard before million-token contexts became common. CostHawk normalizes all pricing to a per-million-token basis across all providers for consistent comparison.
What is the cheapest AI model available right now?+
As of March 2026, the cheapest major-provider model is Google's Gemini 1.5 Flash at $0.075 per million input tokens and $0.30 per million output tokens. For OpenAI, GPT-4o-mini at $0.15/$0.60 is the cheapest option. For Anthropic, Claude 3.5 Haiku at $0.80/$4.00 is the budget tier. If you include batch pricing, the cheapest effective rate is GPT-4o-mini batch at $0.075/$0.30. And if you include open-source models hosted on providers like Together AI or Fireworks, Llama-based models can be even cheaper at $0.05-$0.10 per million tokens. The key question is not which model is cheapest in absolute terms, but which is cheapest for your specific quality requirements.
How much does it cost to process 1 million tokens?+
It depends entirely on the model and whether the tokens are input or output. For GPT-4o: $2.50 for 1M input tokens, $10.00 for 1M output tokens. For Claude 3.5 Sonnet: $3.00 input, $15.00 output. For GPT-4o-mini: $0.15 input, $0.60 output. For o1: $15.00 input, $60.00 output. A typical production workload processing 1M tokens is roughly 60-70% input and 30-40% output. Using GPT-4o with a 65/35 input/output split, 1M total tokens costs approximately (650,000 × $2.50/1M) + (350,000 × $10.00/1M) = $1.625 + $3.50 = $5.125. CostHawk shows your actual blended rate based on your real input/output distribution.
How do cached input tokens affect my effective cost per token?+
Cached input tokens receive a significant discount: 90% off with Anthropic (e.g., $0.30/MTok instead of $3.00/MTok for Sonnet) and 50% off with OpenAI (e.g., $1.25/MTok instead of $2.50/MTok for GPT-4o). If 70% of your input tokens are cached (common when you have a large, stable system prompt), your effective input rate on Anthropic drops from $3.00 to (0.30 × $0.30) + (0.70 × $3.00) = wait, let me correct: (0.70 × $0.30) + (0.30 × $3.00) = $0.21 + $0.90 = $1.11/MTok — a 63% reduction from the list price. On OpenAI with 70% cache hits: (0.70 × $1.25) + (0.30 × $2.50) = $0.875 + $0.75 = $1.625/MTok — a 35% reduction. CostHawk calculates your effective rate automatically by dividing actual billed amount by total tokens consumed.
Is cost per token the same across all languages?+
The dollar rate per token is the same regardless of language, but different languages require different numbers of tokens to express the same content. English is the most efficiently tokenized language — roughly 4 characters per token. Languages like Chinese, Japanese, and Korean may use 1.5-2x more tokens per equivalent content because their characters are less represented in the tokenizer's vocabulary. Arabic, Hindi, and Thai can be even less efficient, using 2-3x more tokens. This means the effective cost per word or per concept is higher in non-English languages, even though the cost per token is identical. If you serve a multilingual application, track tokens per request by language to understand the true cost differential. CostHawk's tag-based attribution lets you segment costs by language or locale.
How does the Batch API reduce cost per token?+
OpenAI's Batch API and Anthropic's Message Batches both offer a 50% discount on both input and output tokens. The tradeoff is latency: batch requests are processed within 24 hours rather than in real time. GPT-4o batch pricing is $1.25/$5.00 per million tokens instead of $2.50/$10.00. Claude Sonnet batch pricing is $1.50/$7.50 instead of $3.00/$15.00. The discount is available because batch requests can be scheduled during off-peak GPU hours, improving provider utilization. Any workload that does not require real-time responses — nightly reports, bulk classification, content moderation queues, document processing pipelines — should use batch pricing. CostHawk identifies workloads that could benefit from batch processing based on their latency patterns.
How fast are AI model prices dropping?+
AI model prices have been dropping at approximately 10x per year for equivalent capability. GPT-4 launched in March 2023 at $30/$60 per million tokens. By October 2024, GPT-4o offered comparable or better quality at $2.50/$10.00 — a 6-12x reduction in 18 months. Budget models have seen even steeper declines: GPT-4o-mini at $0.15/$0.60 offers near-GPT-4 quality at 200x lower cost than the original GPT-4 output price. This deflation is driven by hardware improvements (newer GPU architectures), model efficiency gains (smaller models matching larger ones), and competitive pressure (more providers entering the market). The practical implication: avoid long-term pricing commitments and re-evaluate your model choices quarterly. What costs $1 today may cost $0.10 in a year.
What is the difference between list price and effective price per token?+
List price is the rate published on the provider's pricing page (e.g., $2.50/MTok input for GPT-4o). Effective price is what you actually pay after accounting for caching discounts, batch pricing, committed-use discounts, and free-tier credits. Your effective price is almost always lower than the list price. If 60% of your input tokens hit the cache (50% discount on OpenAI), your effective input rate is $1.75/MTok instead of $2.50/MTok. If you additionally use batch processing for 30% of your traffic (50% discount), your blended effective rate drops further. CostHawk calculates your effective cost per token by dividing your actual provider invoice by your total token consumption, giving you the true unit economics that matter for budgeting and pricing decisions.
Should I always use the cheapest model?+
No. The cheapest model minimizes cost per token but may increase cost per successful query if it produces lower-quality outputs that require retries, human review, or customer support intervention. The right metric is cost per successful outcome, not cost per token. A classification task where GPT-4o-mini achieves 95% accuracy at $0.0003/query may be more expensive overall than GPT-4o at $0.005/query if the 5% error rate causes downstream costs (support tickets, incorrect actions). Benchmark each model on your specific tasks, measure accuracy and retry rates, and calculate the fully-loaded cost per successful outcome. CostHawk tracks both raw cost per token and per-query success rates so you can make this calculation with real data.
How do I forecast my AI costs using cost per token?+
Forecast using this formula: Monthly cost = daily_requests × avg_input_tokens × input_rate + daily_requests × avg_output_tokens × output_rate × 30. For example: 10,000 daily requests averaging 2,000 input tokens and 500 output tokens on GPT-4o: (10,000 × 2,000 × $2.50/1M + 10,000 × 500 × $10.00/1M) × 30 = ($50 + $50) × 30 = $3,000/month. But account for growth: if request volume grows 20% month-over-month, your month-6 cost is $3,000 × 1.2^5 = $7,465. Also account for context growth in conversational apps, where input tokens per request increase over time as conversation histories lengthen. CostHawk provides forecasting tools that project costs based on your actual usage trends, not just static estimates.

Related Terms

AI Cost Glossary

Put this knowledge to work. Track your AI spend in one place.

CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.