GlossaryBilling & PricingUpdated 2026-03-16By Chase Dillingham

Unit Economics

The cost and revenue associated with a single unit of your AI-powered product — whether that unit is a query, a user session, a transaction, or an API call. Unit economics tell you whether each interaction your product serves is profitable or loss-making, and by how much. For AI features built on LLM APIs, unit economics are uniquely volatile because inference costs vary by model, prompt length, and output complexity, making per-unit cost tracking essential for sustainable growth.

Definition

What is Unit Economics?

Unit economics is the financial analysis of the revenue and cost associated with a single fundamental unit of your business. In traditional SaaS, the unit is typically a customer or a subscription. In AI-powered products, the unit is usually more granular: a single query, a user session, a document processed, or an API transaction. AI unit economics answer a deceptively simple question: Does each interaction your product serves make money or lose money? The calculation is straightforward in concept — take the revenue attributable to one unit and subtract all costs required to serve that unit. But in practice, AI unit economics are far more complex than traditional software because the marginal cost of serving each request is non-trivial and highly variable. A simple classification query using GPT-4o mini might cost $0.0002 to serve, while a complex multi-step reasoning task using Claude 3.5 Sonnet with a 50,000-token context window might cost $0.85. Both are "one query" but they differ by a factor of 4,250x in cost. This variability is what makes AI unit economics both critically important and uniquely challenging. Without rigorous per-unit tracking, teams cannot know whether they are building a profitable product or subsidizing every interaction out of their venture capital.

Impact

Why It Matters for AI Costs

AI unit economics are existential for any company building products on top of LLM APIs. Unlike traditional software where marginal costs approach zero, every AI-powered interaction has a real, measurable cost. This fundamentally changes the economics of scaling.

Consider the math for a typical AI-powered SaaS product:

Monthly subscription revenue per user: $49
Average queries per user per month: 500
Average cost per query: $0.12 (blended across models)
Total AI inference cost per user: $60/month

This product is losing $11 per user per month on inference costs alone, before accounting for infrastructure, support, or any other operating expenses. The more users it acquires, the faster it burns cash. This is not a hypothetical scenario — multiple well-funded AI startups have discovered exactly this dynamic after scaling past their initial user base.

The problem is amplified by three factors unique to AI products:

Cost opacity. Most engineering teams do not track per-query costs in real time. They see a monthly API bill from OpenAI or Anthropic and divide by total queries for a rough average. This masks the fact that 10% of queries may account for 60% of costs, and that certain user behaviors (long conversations, document analysis, code generation) are dramatically more expensive than others.
Cost volatility. AI inference costs shift with model updates, pricing changes, prompt modifications, and user behavior patterns. A system prompt change that adds 500 tokens of instruction increases per-query costs across the entire user base. A new feature that requires multi-step reasoning can double the average cost per session overnight.
Revenue-cost decoupling. Most AI products charge flat subscription fees while incurring variable per-usage costs. A power user who sends 3,000 queries per month costs 6x more to serve than a casual user who sends 500, but both pay the same subscription price. Without unit economics tracking, you cannot identify which user segments are profitable and which are destroying margin.

CostHawk provides the granular per-request cost tracking needed to calculate accurate unit economics in real time. By tagging requests with user IDs, feature names, and product tiers, you can see unit economics broken down by any business dimension — per user, per feature, per customer segment, or per pricing tier. This data is the foundation for pricing decisions, model routing strategies, and sustainable growth planning.

What Are AI Unit Economics?

AI unit economics is the discipline of measuring the cost and revenue of a single atomic interaction in your AI-powered product. The concept borrows from traditional unit economics — a framework used in e-commerce (cost per order), ride-sharing (cost per ride), and SaaS (cost per customer) — but applies it to the unique cost structure of AI inference workloads.

In traditional software, serving an additional request costs nearly nothing. The server is already running, the code is already deployed, and the marginal cost of one more database query is measured in fractions of a cent. AI changes this equation fundamentally. Every LLM API call incurs a real, measurable cost that scales linearly with usage. There is no amortization, no economy of scale on the inference itself — 1,000 queries cost roughly 1,000 times what one query costs.

This makes AI products behave more like services businesses than software businesses from a cost perspective. A consulting firm incurs labor costs for every client engagement. An AI product incurs inference costs for every user interaction. The parallel is direct, and it demands the same rigor in tracking per-unit profitability.

AI unit economics encompass several cost components beyond the obvious API call:

Direct inference cost: The per-token charges from your LLM provider (OpenAI, Anthropic, Google, etc.). This is typically 60-80% of total per-unit cost.
Embedding and retrieval cost: If your product uses RAG (retrieval-augmented generation), every query may trigger embedding generation ($0.02-$0.13 per million tokens) and vector database queries ($0.001-$0.01 per query depending on your vector DB provider).
Orchestration overhead: Multi-step agent workflows that make 3-8 LLM calls per user query multiply the direct inference cost proportionally. A coding assistant that plans, writes, reviews, and revises code may make 5 API calls to serve a single user request.
Pre/post-processing compute: Document parsing, image processing, audio transcription, and output formatting all consume compute resources that contribute to per-unit cost.
Infrastructure allocation: Server costs, database queries, logging, and monitoring attributable to each unit. While smaller than inference costs, these add 10-20% to the total per-unit cost at scale.

A complete unit economics model accounts for all of these components, not just the headline API cost. CostHawk tracks the direct inference cost automatically through wrapped keys and MCP telemetry. By combining this with your infrastructure cost data, you can build a comprehensive per-unit cost model that reflects the true economics of your product.

Defining Your Unit

The most critical decision in AI unit economics is choosing the right unit of analysis. The wrong choice produces misleading metrics that drive poor business decisions. The right choice gives you a clear, actionable view of profitability at the level where you can actually influence it.

There are four common unit definitions for AI products, each appropriate for different business models:

1. Per Query / Per Request

The most granular unit. One query equals one user interaction that produces one AI response. This is the right unit for products where each interaction is independent and self-contained: search engines, classification tools, single-turn Q&A systems, and content generation tools where each generation is a separate deliverable.

Per-query economics are straightforward to calculate: take the API cost of that single request (input tokens + output tokens priced at model rates) and add the proportional infrastructure cost. Revenue attribution is the challenge — if users pay a flat subscription, you must allocate a portion of their monthly payment to each query.

Typical per-query costs in production (March 2026):

Simple classification (GPT-4o mini): $0.0001-$0.001
Short-form generation (GPT-4o): $0.003-$0.02
Long-form generation (Claude 3.5 Sonnet): $0.01-$0.15
Complex reasoning with tools (GPT-4o + function calling): $0.02-$0.30
Multi-step agent workflow (3-8 calls): $0.05-$1.20

2. Per Session / Per Conversation

A session groups multiple queries into a single user engagement. This is the right unit for chatbots, coding assistants, and any product where users have multi-turn interactions. Session economics capture the full cost of an engagement, including the growing context window and escalating per-turn costs that individual query metrics miss.

A typical chatbot session might span 8-15 turns. Due to conversation history accumulation, later turns are dramatically more expensive than earlier ones. Turn 1 might cost $0.005, while turn 15 (with 14 turns of history in the context) might cost $0.08. Per-session analysis reveals this compounding dynamic that per-query averages obscure.

3. Per User / Per Customer

The classic SaaS unit. Revenue attribution is simple (the subscription price), and costs aggregate all AI usage by that user over a billing period. This unit is ideal for identifying profitable versus unprofitable customer segments and for setting pricing tiers. The danger is over-aggregation: a per-user view may hide the fact that one feature is profitable while another is deeply loss-making.

4. Per Transaction / Per Output

For products that deliver a discrete, valuable output — a generated report, a processed document, a completed code review, an analyzed image batch — the transaction is the natural unit. This maps well to usage-based pricing models where you charge per document processed or per report generated.

CostHawk supports all four unit definitions through its tagging system. Tag requests with user_id, session_id, feature, or transaction_id to aggregate costs at whatever level matches your business model. Most mature AI products track unit economics at multiple levels simultaneously — per query for optimization, per session for product analytics, and per user for pricing strategy.

Calculating AI Unit Economics

Calculating AI unit economics requires combining data from multiple sources into a single per-unit cost figure. Here is the step-by-step methodology used by teams with mature AI cost practices:

Step 1: Measure direct inference cost per unit

This is the sum of all LLM API charges attributable to one unit. For a single-call unit, it is simply:

inference_cost = (input_tokens / 1,000,000) * input_price + (output_tokens / 1,000,000) * output_price

For a multi-call unit (agent workflows, multi-turn sessions), sum across all calls:

total_inference_cost = SUM(
  (call.input_tokens / 1M) * call.model_input_price +
  (call.output_tokens / 1M) * call.model_output_price
  for each call in unit
)

CostHawk calculates this automatically for every request routed through wrapped keys. The dashboard shows per-request cost with model-level breakdowns, and you can aggregate by any tag to get per-session, per-user, or per-feature totals.

Step 2: Add embedding and retrieval costs

If your product uses RAG, each unit incurs embedding generation costs and vector database query costs:

rag_cost = embedding_cost + vector_query_cost

// Example: text-embedding-3-small at $0.02/1M tokens
// Query text: ~200 tokens = $0.000004
// Vector DB query (Pinecone): ~$0.008 per query
// Total RAG overhead: ~$0.008 per unit

For products with heavy RAG usage, this can add $0.005-$0.02 per unit, which is significant at scale.

Step 3: Account for orchestration multipliers

Agent frameworks (LangChain, CrewAI, AutoGen) often make multiple LLM calls per user-visible interaction. A "simple" code review might involve: (1) analyze the diff, (2) identify issues, (3) generate suggestions, (4) format output — four LLM calls. Track the average number of LLM calls per user-visible unit:

orchestration_multiplier = avg_llm_calls_per_unit
effective_inference_cost = avg_single_call_cost * orchestration_multiplier

Many teams are shocked to discover their orchestration multiplier is 4-8x, meaning each user interaction costs 4-8 times what a single API call would suggest.

Step 4: Allocate infrastructure costs

Distribute your monthly infrastructure bill (servers, databases, monitoring, logging) proportionally across units:

infra_cost_per_unit = monthly_infra_cost / monthly_unit_volume

// Example: $2,000/month infra, 500,000 queries/month
// Infra per unit: $0.004

Step 5: Calculate total cost per unit

total_cost_per_unit = inference_cost + rag_cost + infra_cost_per_unit

// Example:
// Inference: $0.045
// RAG: $0.008
// Infra: $0.004
// Total: $0.057 per query

Step 6: Calculate revenue per unit

revenue_per_unit = monthly_revenue / monthly_unit_volume

// Example: $49 subscription, 500 queries/month
// Revenue per query: $0.098

Step 7: Determine unit margin

unit_margin = revenue_per_unit - total_cost_per_unit
unit_margin_percent = (unit_margin / revenue_per_unit) * 100

// Example: $0.098 - $0.057 = $0.041 margin (41.8%)

A healthy AI product targets 40-60% unit margins after inference costs to leave room for sales, support, and R&D. Below 30% is a warning sign. Below 0% means you are losing money on every interaction and growth accelerates your losses.

Unit Economics by Product Type

Unit economics vary dramatically across AI product categories. The table below shows typical ranges for common AI product types based on aggregated industry data and CostHawk customer benchmarks as of March 2026:

Product Type	Typical Unit	Avg Cost/Unit	Typical Revenue/Unit	Typical Margin	Key Cost Driver
AI Chatbot (customer support)	Session (8-12 turns)	$0.15-$0.60	$0.30-$1.50 (ticket deflection value)	30-65%	Context window growth across turns
AI Writing Assistant	Generation (article/email)	$0.02-$0.25	$0.08-$0.50	40-70%	Output token length
Code Generation Tool	Completion/Edit	$0.01-$0.35	$0.05-$0.80	25-55%	Large code context windows
Document Analysis (legal, medical)	Document processed	$0.50-$5.00	$2.00-$15.00	50-75%	Document length and multi-pass analysis
AI Search / RAG Application	Query	$0.01-$0.08	$0.03-$0.15	35-60%	Embedding + retrieval + generation
Image Generation	Image	$0.02-$0.12	$0.05-$0.25	40-60%	Resolution and model choice
AI Agent (multi-step workflow)	Task completed	$0.20-$3.00	$0.50-$10.00	20-55%	Number of LLM calls per task (3-15)
Real-time Voice AI	Minute of conversation	$0.03-$0.15	$0.10-$0.50	30-65%	Continuous transcription + generation

Several patterns emerge from this data:

Document analysis has the best unit economics because the output is high-value (legal analysis, medical summaries) and customers expect to pay meaningful per-document fees. The cost is higher than simpler use cases, but the revenue-to-cost ratio is favorable.

AI agents have the most volatile unit economics because the number of LLM calls per task is unpredictable. A simple task might complete in 3 calls ($0.20), while a complex one might take 15 calls with retries ($3.00+). Without per-task cost tracking, average metrics mask dangerous outliers. CostHawk customers who tag requests with task_id typically discover that 5-10% of agent tasks account for 40-50% of total agent costs.

Chatbots have deceptively simple-looking economics that deteriorate over time. Early turns are cheap, but as conversation history accumulates, later turns send progressively more context tokens. A 12-turn conversation where each turn adds 300 tokens of history means turn 12 sends 3,600 tokens of history that turn 1 did not. Teams that quote per-query costs based on early-turn averages underestimate true session costs by 30-50%.

Code generation tools face a context window tax. Effective code generation requires sending relevant file contents, project structure, and coding conventions as context. A single completion request with 20,000 tokens of code context costs 10x more than a simple text generation with a 2,000-token prompt, even though both produce similar-length outputs. Tools that pre-fill large context windows pay a premium for quality.

Improving Unit Economics

Improving AI unit economics means either reducing the cost to serve each unit or increasing the revenue per unit. The highest-impact strategies, ordered by typical ROI and ease of implementation:

1. Implement Model Routing (Cost Reduction: 40-70%)

Not every query requires your most expensive model. A model router evaluates each incoming request and directs it to the cheapest model capable of producing an acceptable response. In practice, 50-70% of queries in most AI products can be handled by a smaller, cheaper model without noticeable quality degradation.

Example impact: A product routing 100% of queries to Claude 3.5 Sonnet at $3/$15 per million tokens switches to routing 60% to Claude 3.5 Haiku at $0.80/$4.00 per million tokens. Assuming identical query profiles, the blended inference cost drops by approximately 45%. CostHawk's per-model cost breakdowns make it easy to identify which query types are candidates for routing to cheaper models — look for high-volume, low-complexity queries that are currently being served by frontier models.

2. Optimize Prompts and Context (Cost Reduction: 20-50%)

System prompt bloat is the silent killer of unit economics. Most system prompts accumulate instructions over time as developers add edge case handling, persona guidance, and formatting rules. A structured audit typically finds that 30-50% of system prompt tokens can be removed without affecting output quality. At 100,000 queries per day, reducing a system prompt from 2,000 tokens to 1,000 tokens saves:

1,000 fewer tokens * 100,000 queries * $2.50/1M tokens = $0.25/day input savings
Annualized: ~$91/year just on input tokens for one prompt optimization

For products at scale (1M+ queries/day), the same optimization saves $912/year. Multiply across dozens of prompt templates and the savings become material.

Similarly, conversation history management dramatically affects per-session costs. Implementing a sliding window (keep last 5 turns instead of all turns) or summarizing older context can reduce later-turn costs by 50-70%.

3. Implement Prompt Caching (Cost Reduction: 15-40%)

Both Anthropic (90% discount on cached tokens) and OpenAI (50% discount) offer prompt caching for repeated prompt prefixes. If your system prompt is identical across requests — which it typically is — caching eliminates the majority of system prompt input costs. For a 2,000-token system prompt at 100,000 queries/day on Anthropic:

Without caching: 2,000 * 100,000 * $3.00/1M = $0.60/day
With caching (90% discount): $0.06/day
Savings: $0.54/day = $197/year

4. Right-Size Output Length (Cost Reduction: 15-30%)

Output tokens cost 4-5x more than input tokens. Setting appropriate max_tokens limits and instructing models to be concise reduces output costs without necessarily reducing quality. A product that generates 500-token responses when 250 tokens would suffice is paying double the necessary output cost. Add explicit length guidance to your prompts: "Respond in 2-3 sentences" or "Keep your response under 150 words."

5. Introduce Usage-Based Pricing (Revenue Increase: Variable)

If your unit economics are negative on a per-user basis because power users consume disproportionate resources, consider hybrid pricing: a base subscription fee plus usage-based charges above a threshold. This aligns revenue with cost and eliminates the "whale user" problem where a small number of heavy users destroy aggregate margins. Many AI products have moved to models like "$49/month includes 1,000 queries, $0.05 per additional query" to protect unit economics while maintaining accessible pricing for average users.

6. Batch and Deduplicate Requests (Cost Reduction: 10-25%)

Many AI products make redundant API calls: re-analyzing the same document when it has not changed, re-generating embeddings for content that was already embedded, or sending near-identical queries when a cached response would suffice. Implement semantic caching (cache responses for queries similar to previous ones) and content-hash deduplication to eliminate wasted inference spend. OpenAI's Batch API also offers a 50% discount for non-time-sensitive workloads.

When Unit Economics Break

Even well-designed AI products can experience unit economics breakdowns. Recognizing the warning signs early prevents small problems from becoming existential threats. Here are the six most common scenarios where AI unit economics deteriorate, along with detection and remediation strategies:

1. The Context Window Spiral

Symptom: Per-session costs increase 3-5x over a 3-month period without corresponding increases in session length or user count. Root cause: Developers gradually add more context to prompts — additional system instructions, larger document chunks in RAG, more conversation history — without measuring the cost impact. Each addition seems small, but they compound.

Detection: Monitor average input tokens per request over time in CostHawk. A steady upward trend without a corresponding product change is a red flag. Alert threshold: input tokens per request increasing more than 15% month-over-month.

Remediation: Audit all prompt templates and context injection logic. Set token budgets for each context component (system prompt: max 1,500 tokens; RAG context: max 3,000 tokens; conversation history: max 2,000 tokens). Enforce these budgets in code.

2. The Power User Subsidy

Symptom: Aggregate unit economics look healthy, but user-level analysis reveals that 8-12% of users account for 50-60% of total inference costs while paying the same subscription fee as everyone else. Root cause: flat-rate pricing with no usage caps or tiers.

Detection: In CostHawk, tag requests with user_id and generate a cost distribution report. If the Gini coefficient of per-user costs exceeds 0.6, you have a power user problem. Look for users whose monthly inference cost exceeds 2x their subscription price.

Remediation: Introduce usage tiers, rate limits, or usage-based pricing above a threshold. Alternatively, implement model routing that serves heavier users with cheaper models after they exceed a daily query budget.

3. The Agent Runaway

Symptom: Sudden, unpredictable cost spikes from agent workflows that make far more LLM calls than expected. A single user task might generate 20-50 API calls when the expected range is 3-8. Root cause: agentic loops without iteration limits, retry storms from transient API errors, or recursive tool-calling patterns.

Detection: Monitor the number of LLM calls per task/session. Set hard limits on maximum calls per agent execution (e.g., 15 calls max). Alert on any task that exceeds 2x the 95th percentile call count. CostHawk's anomaly detection can flag individual sessions whose cost exceeds 3x the rolling average.

Remediation: Implement circuit breakers that terminate agent loops after a configurable maximum number of iterations. Add cost circuit breakers that terminate execution if per-task cost exceeds a threshold (e.g., $2.00). Log and review terminated executions to improve agent reliability.

4. The Model Price Shock

Symptom: Monthly costs jump 20-40% without usage changes. Root cause: a provider changes pricing, or your codebase is updated to use a more expensive model variant (e.g., a library update that defaults to a newer, pricier model).

Detection: CostHawk tracks cost per token over time. A sudden change in effective price per token without a deliberate model change indicates either a pricing update or an unintended model switch. Pin model versions in your codebase and monitor for drift.

Remediation: Pin specific model versions in all API calls (e.g., gpt-4o-2024-11-20 instead of gpt-4o). Subscribe to provider pricing update announcements. Maintain a model pricing table (CostHawk does this automatically) and alert when effective rates diverge from expected rates.

5. The Feature That Kills Margins

Symptom: Unit economics were healthy at 45% margin, then a new feature launches and margins drop to 15% within two weeks. Root cause: the new feature requires expensive inference (large context, long output, frontier model) but was not priced or cost-modeled before launch.

Detection: Tag all requests with feature in CostHawk and monitor per-feature unit economics. Any feature with margins below 20% needs immediate attention. Require cost impact analysis as part of the feature development process — estimate per-query cost before building, then validate after launch.

Remediation: Cost-model every feature before development. For expensive features, consider gating them to higher pricing tiers, using cheaper models with acceptable quality tradeoffs, or implementing usage limits specific to that feature.

6. The Currency of Growth

Symptom: The product is growing rapidly — user count doubles in 3 months — but per-user revenue stays flat while per-user costs increase. New users are attracted by aggressive pricing or a generous free tier, and their usage patterns are heavier than early adopters. Root cause: growth-stage pricing that prioritizes acquisition over unit economics.

Detection: Segment unit economics by user cohort (signup month) in CostHawk. If newer cohorts have worse unit economics than older cohorts, growth is diluting margins. Also monitor the ratio of free-tier to paid-tier usage.

Remediation: Restructure the free tier to limit costly operations (cap queries, restrict model access, limit output length). Ensure paid tier pricing covers at least the 80th percentile of per-user cost. Some margin compression during hypergrowth is acceptable, but the trend line must show a path back to healthy margins as the product matures.

FAQ

Frequently Asked Questions

What is a good target margin for AI unit economics?+

For AI-powered SaaS products, target a 40-60% gross margin per unit after all inference and infrastructure costs. This leaves sufficient room for sales and marketing (15-25% of revenue), R&D (15-20%), support (5-10%), and net profit. Margins below 30% are a warning sign that your pricing, model selection, or prompt efficiency needs attention. Margins below 0% mean you are losing money on every interaction and must take immediate corrective action — either by raising prices, routing to cheaper models, optimizing prompts, or limiting usage. Keep in mind that margins should be calculated at the unit level, not just at the aggregate monthly level, because aggregate figures can mask the fact that certain features, user segments, or query types are deeply unprofitable. CostHawk's per-request cost tracking lets you segment margins by any dimension — feature, user tier, model, time of day — so you can identify and address margin-destroying outliers before they erode your overall economics. For reference, traditional SaaS targets 70-80% gross margins, but the variable inference costs of AI products make 40-60% a realistic and healthy target for most teams.

How do I calculate unit economics when my product uses multiple models?+

When your product uses multiple models — which is increasingly common as teams implement model routing — calculate a blended cost per unit by weighting each model's cost by its share of total requests. For example, if 60% of queries go to GPT-4o mini at $0.002 average cost and 40% go to Claude 3.5 Sonnet at $0.045 average cost, your blended cost per query is (0.6 * $0.002) + (0.4 * $0.045) = $0.0192. However, simple blended averages can be misleading. The more valuable approach is to calculate unit economics separately for each model tier and understand which query types are routed where. CostHawk automatically tags each request with the model used, so you can view cost distributions by model and by feature. This reveals optimization opportunities: if you find that 15% of queries routed to your expensive model could have been handled by the cheaper model with acceptable quality, re-routing those queries improves your blended unit economics by the cost differential. Track both the blended average (for overall business health) and the per-model breakdown (for optimization targeting).

How do unit economics differ between subscription and usage-based pricing?+

With subscription pricing (flat monthly fee), your revenue per unit is calculated by dividing the subscription price by the number of units consumed: $49/month divided by 500 queries equals $0.098 per query. The risk is that heavy users consume far more than average, making their individual unit economics negative while the aggregate looks fine. With usage-based pricing (charge per query, per document, per API call), revenue per unit is fixed and known — if you charge $0.10 per query, every query generates $0.10 regardless of who sends it. This naturally aligns revenue and cost, eliminating the power user subsidy problem. The trade-off is that usage-based pricing creates user anxiety about unpredictable bills and can reduce adoption and engagement. The most successful AI products use hybrid models: a base subscription that includes a generous but bounded allocation (e.g., 1,000 queries/month) with usage-based charges beyond that threshold. This gives predictable costs for most users while ensuring heavy users pay their fair share. CostHawk enables both models by tracking per-user, per-query costs in real time, providing the data foundation for whichever pricing model you choose.

Should I track unit economics in development and staging environments?+

Yes, but separately from production metrics. Development and staging environments often account for 20-40% of total AI API spend, and tracking their unit economics prevents waste and provides early cost signals for features under development. The key is to never mix dev/staging costs into your production unit economics calculations, as this would distort your understanding of actual product profitability. Use separate API keys or CostHawk tags for each environment so costs are cleanly attributed. Development environments deserve their own cost optimization: implement model downgrade rules (use GPT-4o mini in dev even if production uses GPT-4o), reduce max_tokens limits, and cache responses for repeated test queries. Many teams set up a dev cost budget — for example, $500/month — and alert when development inference costs approach the limit. CostHawk's environment tagging makes it easy to see dev vs staging vs production costs side by side. A useful practice is to estimate the production unit economics of a new feature during development by measuring its per-query cost in staging with production-grade models, then multiplying by projected production volume. This catches margin-destroying features before they ship.

How do I handle unit economics for free tier users?+

Free tier users have zero revenue per unit, which means every interaction is a net cost. The question is not whether free tier unit economics are negative — they always are — but whether the cost is justified by the business value of the free tier (conversion to paid, network effects, market presence). To manage free tier unit economics, you need two metrics: (1) cost per free user per month, and (2) free-to-paid conversion rate. If a free user costs $3/month in inference and your free-to-paid conversion rate is 5%, each paying conversion effectively cost $60 in free tier inference ($3/month * 1/0.05 * average free tier duration). Compare this to your customer acquisition cost (CAC) from other channels. If paid search CAC is $80, the free tier at $60 equivalent CAC is a good deal. Optimize free tier costs by routing free users to cheaper models (GPT-4o mini instead of GPT-4o), limiting output length, capping daily queries (e.g., 20/day), and restricting access to expensive features like multi-step agents. CostHawk lets you tag requests with the user's pricing tier, so you can monitor free tier costs as a separate line item and ensure they stay within your customer acquisition budget. If free tier costs grow faster than conversions, tighten the limits.

What role does prompt caching play in unit economics?+

Prompt caching can dramatically improve unit economics by reducing the cost of the most repetitive component of your input: the system prompt. Anthropic offers a 90% discount on cached input tokens (you pay 10% of the standard rate), while OpenAI offers a 50% discount. Since system prompts are typically identical across all requests and often represent 30-60% of input tokens, caching effectively reduces input costs by 27-54% depending on the provider and the system prompt's share of total input. For a concrete example: a product sending 200,000 queries per day with a 1,500-token system prompt using Claude 3.5 Sonnet ($3.00/1M input tokens) spends $0.90/day on system prompt input tokens alone. With Anthropic's 90% cache discount, that drops to $0.09/day — saving $0.81/day or $296/year from one optimization. The impact on unit economics: if total cost per query was $0.045, caching might reduce it to $0.041 — a small per-query difference but a $296 annual saving that flows directly to margin. CostHawk reports whether prompt caching is active on your requests and calculates the savings, helping you verify that caching is working correctly and quantify its impact on your unit economics.

How often should I recalculate unit economics?+

Recalculate unit economics at three cadences: real-time monitoring for anomaly detection, weekly for operational decisions, and monthly for strategic planning. Real-time monitoring (which CostHawk provides automatically) alerts you to sudden changes — a model price increase, a runaway agent loop, or a traffic spike from an unexpected source. Weekly review of unit economics by feature and user segment catches gradual degradation: prompt bloat, context window creep, or shifting user behavior that slowly erodes margins. Monthly strategic review aggregates data for pricing decisions, budget planning, and product roadmap prioritization. The most important recalculation trigger is any change event: a new feature launch, a model switch, a prompt update, a pricing tier change, or a significant shift in user mix. Any of these can materially affect unit economics within days. Build cost impact estimation into your development process — before shipping a feature, estimate its per-query cost and multiply by projected volume to understand the margin impact. After shipping, compare actual unit economics to the estimate within the first week. CostHawk's historical comparison feature lets you overlay pre-change and post-change cost distributions to quantify the exact impact of any change on your unit economics.

Can CostHawk help me track unit economics across multiple providers?+

Yes. CostHawk aggregates cost data from OpenAI, Anthropic, Google, Mistral, and other LLM providers into a unified dashboard where all costs are normalized to a common format: input tokens, output tokens, model, cost, and any tags you attach. This is critical for accurate unit economics because many products use multiple providers — perhaps OpenAI for general queries, Anthropic for complex reasoning, and a smaller model for classification. Without unified tracking, you end up reconciling three separate billing dashboards, each with different formats, billing cycles, and granularity levels. CostHawk solves this by ingesting usage data through wrapped API keys (which proxy your requests and log costs transparently), direct API syncing with provider usage dashboards, and MCP telemetry for Claude Code and Codex CLI usage. All data flows into a single cost ledger where you can calculate true blended unit economics across providers. The savings breakdown view shows how much you spend per provider, per model, and per feature, making it easy to spot opportunities to shift traffic between providers based on cost-quality tradeoffs. For teams using model routing, CostHawk shows the before-and-after unit economics of routing decisions, validating that cheaper models are actually maintaining acceptable quality levels.

Related Terms

Cost Per Query

The total cost of a single end-user request to your AI-powered application, including all token consumption, tool calls, and retries.

AI ROI (Return on Investment)

The financial return generated by AI investments relative to their total cost. AI ROI is uniquely challenging to measure because the benefits — productivity gains, quality improvements, faster time-to-market — are often indirect, distributed across teams, and difficult to isolate from other variables. Rigorous ROI measurement requires a framework that captures both hard-dollar savings and soft-value gains.

Total Cost of Ownership (TCO) for AI

The complete, all-in cost of running AI in production over its full lifecycle. TCO extends far beyond API fees to include infrastructure, engineering, monitoring, data preparation, quality assurance, and operational overhead. Understanding true TCO is essential for accurate budgeting, build-vs-buy decisions, and meaningful ROI calculations.

Model Routing

Dynamically directing AI requests to different models based on task complexity, cost constraints, and quality requirements to achieve optimal cost efficiency.

Token Pricing

The per-token cost model used by AI API providers, with separate rates for input tokens, output tokens, and cached tokens. Token pricing is the fundamental billing mechanism for LLM APIs, typically quoted per million tokens, and varies by model, provider, and usage tier.

Pay-Per-Token

The dominant usage-based pricing model for AI APIs where you pay only for the tokens you consume, with no upfront commitment or monthly minimum.

AI Cost Glossary

Put this knowledge to work. Track your AI spend in one place.

CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.

Get started free Back to Glossary