Cost Per Query
The total cost of a single end-user request to your AI-powered application, including all token consumption, tool calls, and retries.
Definition
What is Cost Per Query?
(input_tokens × input_price) + (output_tokens × output_price), but the true CPQ often includes multiple LLM calls, embedding lookups, tool invocations, and retries. CPQ ranges from $0.0003 for a simple GPT-4o-mini classification to over $1.00 for a complex multi-step agent chain using reasoning models. CPQ is the most important unit economics metric for any AI-powered product because it directly determines your gross margin.Impact
Why It Matters for AI Costs
Calculating Cost Per Query
The simplest form of CPQ is a single LLM call:
CPQ = (input_tokens × input_rate) + (output_tokens × output_rate)For a GPT-4o request with 1,000 input tokens and 500 output tokens:
CPQ = (1,000 × $2.50/1M) + (500 × $10.00/1M)
= $0.0025 + $0.005
= $0.0075But most production applications involve multiple steps per user query. A RAG-based Q&A system might include: (1) an embedding call to vectorize the query, (2) a vector database lookup (infrastructure cost), (3) an LLM call with retrieved context, and optionally (4) a follow-up LLM call for answer refinement. The true CPQ is the sum of all these steps:
CPQ_rag = embedding_cost + retrieval_infra_cost + llm_call_cost
= $0.000002 + $0.0001 + $0.0075
= ~$0.0076For agent-based applications, CPQ becomes even more complex because the number of LLM calls per query is variable — an agent might make 3 tool calls or 15, depending on the task. In these cases, CPQ must be measured empirically using median and P95 values rather than calculated from a fixed formula.
Cost Benchmarks by Use Case
The following table provides typical CPQ ranges for common AI application patterns, based on GPT-4o pricing as of March 2026. Actual costs vary by prompt design, context length, and output requirements.
| Use Case | Typical Input Tokens | Typical Output Tokens | LLM Calls per Query | Median CPQ |
|---|---|---|---|---|
| Text classification | 200-500 | 5-20 | 1 | $0.0003-$0.002 |
| Sentiment analysis | 300-800 | 10-50 | 1 | $0.001-$0.003 |
| Entity extraction | 500-2,000 | 50-200 | 1 | $0.002-$0.007 |
| RAG Q&A | 2,000-8,000 | 200-800 | 1-2 | $0.007-$0.03 |
| Document summarization | 5,000-30,000 | 500-2,000 | 1-3 | $0.02-$0.10 |
| Code generation | 1,000-5,000 | 500-3,000 | 1-2 | $0.01-$0.05 |
| Conversational chatbot | 1,000-10,000 | 200-1,000 | 1 | $0.005-$0.04 |
| Multi-step agent chain | 3,000-20,000 | 1,000-10,000 | 3-15 | $0.05-$1.00+ |
| Reasoning (o1) | 1,000-5,000 | 5,000-30,000 | 1 | $0.30-$2.00 |
The 3,000x range between the cheapest classification query ($0.0003) and the most expensive agent chain ($1.00+) shows why CPQ analysis must be done per use case, not as an organization-wide average. A single agent workflow can cost more than 1,000 classification queries.
Hidden Costs in Multi-Step Queries
The most common source of CPQ underestimation is multi-step queries where a single user interaction triggers multiple LLM calls. These hidden costs include:
- Agent loops: An AI agent that uses tool calling may invoke 5-15 LLM calls per user query. Each call includes the full conversation history plus tool results, causing input tokens to grow quadratically. The 5th call in a chain might send 10,000 input tokens of accumulated context.
- Tool call overhead: Each tool call adds tokens for the tool definition (in the system prompt), the tool invocation (in the output), and the tool result (in the next input). A single tool call can add 200-500 tokens of overhead beyond the tool's actual payload.
- Retries and fallbacks: When a model returns malformed JSON or fails a validation check, the application retries — doubling the cost of that step. If you have a fallback from GPT-4o-mini to GPT-4o on failure, the fallback call costs 16x more than the original attempt.
- Guardrail and moderation calls: Content moderation, PII detection, and output guardrails each add an LLM call. A system that runs input moderation, generation, and output validation makes 3 LLM calls per query minimum.
- Conversation history growth: In chatbot applications, each turn sends the full conversation history as input. The 10th message in a conversation sends 10x more input tokens than the first message. CPQ for turn 10 is dramatically higher than CPQ for turn 1.
CostHawk's request tracing groups all LLM calls triggered by a single user interaction into one trace, revealing the true multi-step CPQ that provider dashboards cannot show.
Reducing Cost Per Query
CPQ optimization works across four dimensions: model selection, token reduction, caching, and architecture.
Model selection: Route simple tasks to cheaper models. A classification task on GPT-4o-mini costs $0.0003; the same task on GPT-4o costs $0.005 — a 16x difference with minimal quality impact. Use model routing to automatically select the cheapest model that meets your quality threshold for each query type.
Token reduction: Compress prompts, trim conversation history, set strict max_tokens limits, and use structured output. These strategies reduce both input and output costs. A well-optimized prompt can cut CPQ by 30-50% compared to a naive implementation.
Caching: Prompt caching (90% input discount from Anthropic, 50% from OpenAI) reduces the input portion of CPQ for repetitive system prompts. Semantic caching can eliminate the LLM call entirely for repeated queries, reducing CPQ to near zero for cache hits.
Architecture: Reduce the number of LLM calls per query. Replace agent loops with deterministic pipelines where possible. Pre-compute tool results that do not change between requests. Batch guardrail checks instead of running them individually. Each eliminated LLM call removes an entire cost step from your CPQ.
The order of impact is typically: caching (highest) > model routing > token reduction > architecture. Start by measuring your current CPQ distribution, then apply optimizations to the most expensive query types first.
CPQ and Product Pricing Strategy
CPQ is the foundation of pricing strategy for AI-powered products. Your subscription price, per-query price, or usage-based rate must cover CPQ plus infrastructure overhead plus margin. Here is how to model it:
Required revenue per query = CPQ × (1 + infrastructure_overhead) × (1 + target_margin)
Example: CPQ = $0.02, overhead = 20%, margin = 50%
Required = $0.02 × 1.2 × 1.5 = $0.036 per queryFor subscription pricing, work backward from usage patterns:
Monthly queries per user = 500 (median)
Median CPQ = $0.02
Monthly AI cost per user = 500 × $0.02 = $10.00
Minimum subscription price = $10.00 × 1.2 × 1.5 = $18.00/monthBut median CPQ is not enough — you must account for the distribution. If your P95 CPQ is $0.15 (from agent-heavy queries), your P95 cost per user is $75/month, which destroys your margin at a $20/month price point. You need usage caps, tiered pricing, or model routing to prevent heavy users from making you unprofitable.
CostHawk's CPQ distribution charts show you the median, P75, P90, and P95 CPQ values so you can price your product with confidence and set usage limits that protect your margin.
Monitoring CPQ with CostHawk
CostHawk provides purpose-built CPQ monitoring that goes beyond raw token counting:
- Per-endpoint CPQ: Tag API routes with CostHawk labels and see the average, median, and P95 CPQ for each endpoint. Identify which features are expensive and which are efficient.
- CPQ trends over time: Track how CPQ changes as you modify prompts, switch models, or add new features. A rising CPQ trend is an early warning of context bloat or scope creep in agent behavior.
- CPQ by user segment: Break down CPQ by customer plan tier, geography, or usage pattern. Power users who trigger complex agent chains may have 10-50x higher CPQ than casual users.
- CPQ anomaly alerts: Set alerts for individual queries that exceed a CPQ threshold. A single runaway agent query costing $5.00 is a signal to investigate your agent loop termination logic.
- Multi-step trace grouping: CostHawk groups all LLM calls, embedding calls, and tool invocations triggered by a single user request into one trace. The trace CPQ is the true cost that matters for unit economics, not the per-call cost that provider dashboards show.
Teams using CostHawk's CPQ monitoring typically identify 20-40% cost reduction opportunities within the first week by finding queries where CPQ is far above the median due to missing max_tokens limits, excessive agent steps, or unnecessary model upgrades.
FAQ
Frequently Asked Questions
What is a good cost per query for an AI chatbot?+
How do agent chains affect cost per query?+
How do I calculate CPQ for a RAG application?+
What is the difference between CPQ and cost per token?+
How can caching reduce cost per query?+
Should I track median or average CPQ?+
How does model routing lower CPQ?+
What is the CPQ impact of conversation history growth?+
How do retries and fallbacks affect CPQ?+
Can CPQ be negative? What about free-tier provider credits?+
Related Terms
Token Pricing
The per-token cost model used by AI API providers, with separate rates for input tokens, output tokens, and cached tokens. Token pricing is the fundamental billing mechanism for LLM APIs, typically quoted per million tokens, and varies by model, provider, and usage tier.
Read moreInput vs. Output Tokens
The two token directions in every LLM API call, each priced differently. Output tokens cost 3-5x more than input tokens across all major providers.
Read moreCost Per Token
The unit price an AI provider charges for processing a single token, quoted per million tokens. Ranges from $0.075/1M for budget models to $75.00/1M for frontier reasoning models — an 1,000x spread.
Read moreModel Routing
Dynamically directing AI requests to different models based on task complexity, cost constraints, and quality requirements to achieve optimal cost efficiency.
Read moreBatch API
Asynchronous API endpoints that process large volumes of LLM requests at a 50% discount in exchange for longer turnaround times.
Read moreCost Anomaly Detection
Automated detection of unusual AI spending patterns — sudden spikes, gradual drift, and per-key anomalies — before they become budget-breaking surprises.
Read moreAI Cost Glossary
Put this knowledge to work. Track your AI spend in one place.
CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.
