Logging
Recording LLM request and response metadata — tokens consumed, model used, latency, cost, and status — for debugging, cost analysis, and compliance. Effective LLM logging captures the operational envelope of every API call without storing sensitive prompt content.
Definition
What is Logging?
LLM logging is the practice of recording structured metadata about every language model API call your application makes. Unlike traditional application logging, which captures stack traces and error messages, LLM logging focuses on the unique attributes of AI API interactions: input and output token counts, the model used, request latency, computed cost, HTTP status code, cache hit/miss status, and request identifiers for correlation. The key distinction is that LLM logging captures the metadata envelope of each request — not the prompt content or model response, which may contain PII, proprietary information, or user data subject to privacy regulations.
A single LLM log entry typically contains 15–25 fields that collectively answer the questions: What model did we call? How many tokens did we send and receive? How long did it take? How much did it cost? Did it succeed? And how does this request relate to the broader user session or feature that triggered it? Aggregated across thousands or millions of requests, these log entries become the raw material for cost analysis, performance monitoring, anomaly detection, capacity planning, and compliance reporting.
For AI cost management, logging is the foundational data layer. You cannot optimize what you cannot see, and you cannot see without logs. Every chart in a cost dashboard, every alert that fires on a budget threshold, every anomaly detection algorithm that spots unusual spending — all of these consume log data as their primary input. Without comprehensive LLM logging, cost management degenerates into checking monthly invoices after the fact, with no ability to attribute, diagnose, or prevent overspend.
Impact
Why It Matters for AI Costs
LLM API costs are invisible without logging. Provider invoices arrive monthly and show aggregate spend — useful for accounting, useless for engineering. Logging transforms a single monthly number into a rich, queryable dataset that reveals exactly where every dollar goes and why.
Consider a team spending $35,000/month on AI APIs. Without logging, they know they spent $35,000. With logging, they know:
- $14,200 (41%) went to their customer support agent, which makes 8 tool calls per session at an average cost of $0.23 per session
- $9,800 (28%) went to document summarization, where 12% of requests are retries triggered by timeout errors on long documents
- $6,100 (17%) went to their development environment, where engineers are testing prompts against GPT-4o instead of GPT-4o mini
- $3,400 (10%) went to a deprecated feature that was supposed to be decommissioned two months ago
- $1,500 (4%) went to a single API key that was accidentally left in a CI/CD pipeline, running integration tests on every commit
Each of those insights is actionable. Fixing the retry logic saves $1,176/month. Switching dev to GPT-4o mini saves $5,700/month. Decommissioning the deprecated feature saves $3,400/month. Removing the CI/CD key saves $1,500/month. Total savings from logging visibility: $11,776/month — a 34% cost reduction from insights that are impossible without per-request log data.
Beyond cost optimization, LLM logging supports regulatory compliance (audit trails for AI decisions), incident response (tracing a bad output back to its model and prompt configuration), capacity planning (predicting future API spend based on traffic trends), and vendor negotiation (showing actual usage patterns when negotiating enterprise pricing or committed-use discounts with providers). CostHawk ingests LLM log data from wrapped API keys and provider sync integrations, converting raw logs into actionable cost intelligence without requiring you to build the analytics pipeline yourself.
What is LLM Logging?
LLM logging is the systematic capture and storage of structured metadata for every language model API call. It extends the concept of traditional application logging — recording what happened, when, and whether it succeeded — into the AI-specific domain where the key metrics are tokens, models, costs, and latencies rather than HTTP methods, endpoints, and response codes.
An LLM log entry is fundamentally different from a traditional API log entry because the economics are different. A traditional REST API call has negligible marginal cost — the server CPU time for one more JSON response is fractions of a cent. An LLM API call has significant and variable marginal cost — a single request can cost anywhere from $0.0001 (a short GPT-4o mini completion) to $2.00+ (a long Claude 3 Opus completion with a full context window). This cost variability means that LLM logs must capture the inputs to the cost calculation (token counts, model identifier) with the same rigor that financial systems capture transaction amounts.
A well-designed LLM log entry captures data at three levels:
Level 1: Request metadata (always logged)
| Field | Type | Example | Purpose |
|---|---|---|---|
request_id | UUID | 550e8400-e29b-41d4-a716-446655440000 | Unique identifier for correlation |
timestamp | ISO 8601 | 2026-03-16T14:23:07.412Z | When the request was made |
model | string | gpt-4o-2024-11-20 | Exact model version (not just family) |
provider | string | openai | API provider |
input_tokens | integer | 1847 | Tokens in the prompt |
output_tokens | integer | 423 | Tokens in the completion |
total_tokens | integer | 2270 | Sum of input + output |
cost_usd | decimal | 0.00886 | Computed cost based on model pricing |
latency_ms | integer | 1243 | End-to-end request duration |
status_code | integer | 200 | HTTP response status |
cached | boolean | true | Whether prompt caching was used |
cache_read_tokens | integer | 1200 | Tokens served from cache (discounted) |
Level 2: Attribution metadata (recommended)
| Field | Type | Example | Purpose |
|---|---|---|---|
api_key_id | string | key_prod_support_agent | Which key made the request |
project | string | customer-support | Project or service name |
environment | string | production | prod / staging / dev |
feature | string | ticket-summarizer | Specific feature or endpoint |
user_id | string | usr_abc123 | End user (hashed or anonymized) |
session_id | string | sess_def456 | User session for multi-turn tracking |
trace_id | string | trace_ghi789 | Distributed trace correlation ID |
Level 3: Diagnostic metadata (optional, for debugging)
| Field | Type | Example | Purpose |
|---|---|---|---|
temperature | float | 0.3 | Generation temperature used |
max_tokens | integer | 1024 | Max output token setting |
stop_reason | string | end_turn | Why generation stopped |
tool_calls | integer | 3 | Number of tool/function calls |
retry_count | integer | 0 | How many retries occurred |
time_to_first_token_ms | integer | 287 | Streaming TTFT latency |
Notice what is absent from all three levels: prompt content and response content. This is deliberate and critical for security, privacy, and compliance, as discussed in the next section.
What to Log (and What NOT to Log)
The most important decision in LLM logging is what to exclude. Prompts and completions frequently contain personally identifiable information (PII), proprietary business data, user-generated content, health information, financial details, and other sensitive material. Logging this content creates security liabilities, regulatory exposure, and storage costs that far outweigh the debugging value.
What you SHOULD log:
- All fields in Level 1 and Level 2 above. Token counts, model identifiers, costs, latencies, and attribution tags are the core data needed for cost analysis, performance monitoring, and compliance reporting. None of these contain sensitive content.
- Error codes and error types (but not full error messages, which may echo back prompt content). Log
"error_type": "rate_limit_exceeded", not the full API response body. - Request configuration parameters (temperature, max_tokens, top_p, stop sequences). These are non-sensitive and essential for debugging quality issues — if outputs suddenly degrade, knowing that someone changed the temperature from 0.3 to 1.0 is invaluable.
- Hashed or truncated identifiers. If you need to correlate logs with specific users, use one-way hashed user IDs rather than raw identifiers. A SHA-256 hash of the user ID provides correlation capability without exposing the actual identity in logs.
What you should NOT log by default:
- Prompt content (system messages, user messages, conversation history). This is the most sensitive data in the LLM pipeline. System prompts may contain proprietary business logic and instructions. User messages may contain PII, health information, financial data, or any other sensitive content your users share with your application. Logging these creates a honeypot — a centralized store of sensitive data that must be encrypted, access-controlled, retention-managed, and potentially subject to GDPR right-to-erasure requests.
- Model response content. The model's output may reflect, summarize, or rephrase the user's sensitive input. It may also contain hallucinated PII (names, addresses, phone numbers that the model generates). Logging responses carries the same risks as logging prompts.
- Full API request/response bodies. These contain prompt and completion content along with metadata. Log the metadata; discard the rest.
- API keys or authentication tokens. Never log credentials, even in redacted form. Log the key identifier or a hashed representation, not the key itself.
When you might choose to log content (with safeguards):
Some use cases genuinely benefit from content logging — debugging prompt regressions, training data collection, compliance audit trails for regulated industries. If you must log content, implement these safeguards:
- Separate storage: Store content logs in a different system from metadata logs, with stricter access controls, encryption at rest and in transit, and shorter retention periods.
- PII detection and redaction: Run content through a PII detection pipeline (Amazon Comprehend, Google DLP, Presidio) before logging. Redact or mask detected PII:
"My SSN is [REDACTED]". - Consent and legal review: Ensure your privacy policy covers LLM content logging and that you have legal basis (consent, legitimate interest, or contractual necessity) under applicable regulations (GDPR, CCPA, HIPAA).
- Retention limits: Set aggressive retention policies — 7–30 days for debugging content, with automatic deletion. Metadata logs can be retained for 12–24 months for trend analysis.
- Access audit: Log who accesses content logs and why. Restrict access to a minimal set of authorized personnel.
CostHawk's logging architecture follows the metadata-only approach by default. When you route API calls through CostHawk wrapped keys, CostHawk captures token counts, model identifiers, latency, cost, and attribution tags — never prompt or response content. This design eliminates PII concerns, reduces storage costs, and simplifies compliance, while providing all the data needed for comprehensive cost analysis.
Logging for Cost Analysis
The primary consumer of LLM log data in most organizations is the cost analytics pipeline. Raw logs are transformed into cost insights through aggregation, attribution, and trend analysis. Here is how each log field contributes to cost visibility:
Token counts → cost computation. The most fundamental cost calculation multiplies token counts by per-token rates:
cost = (input_tokens × input_rate) + (output_tokens × output_rate)
// With prompt caching:
cost = ((input_tokens - cache_read_tokens) × input_rate)
+ (cache_read_tokens × cached_input_rate)
+ (output_tokens × output_rate)This computation must be performed at log ingestion time, not query time, because pricing rates change over time. If you compute cost at query time using current rates, historical costs will be wrong. Store the computed cost_usd with each log entry, along with the rate table version used for computation. CostHawk maintains a continuously updated pricing database for every model from every provider, ensuring cost computation is accurate even when rates change.
Model field → cost attribution by model. Grouping logs by model reveals which models drive the most spend. A common finding: teams discover that 15–30% of their spend goes to a frontier model for requests that a model 10x cheaper could handle equally well. Without per-model logging, this insight is invisible — the monthly invoice shows total OpenAI spend but not which models were called or how frequently.
API key / project / feature → cost attribution by owner. Attribution fields enable chargeback and accountability. When the data platform team can see that their embedding pipeline costs $4,200/month and the customer support team's agent costs $8,700/month, each team has both the visibility and the incentive to optimize their own usage. Without attribution, cost optimization is everyone's problem and therefore no one's problem.
Environment → dev/staging/prod cost split. One of the most common cost surprises is development environment spending. Engineers testing prompts, running experiments, and debugging issues often use production-tier models without realizing the cost. Logging with environment tags reveals the split. Industry data suggests that 20–40% of AI API spend goes to non-production environments — a number that shocks most engineering leaders when they first see it. Simply switching development environments to cheaper models can save 10–25% of total spend overnight.
Timestamp → temporal analysis. Time-series aggregation reveals spending patterns that point-in-time numbers miss:
- Hourly patterns: Peak spending during business hours, overnight batch processing spikes, weekend dips. Understanding these patterns enables capacity planning and rate limit management.
- Daily trends: Gradually increasing daily spend may indicate growing traffic (expected) or prompt drift/context accumulation (unexpected). A sudden step-function increase usually corresponds to a deployment.
- Weekly cycles: B2B applications often show 5x weekday vs. weekend spend differences. Consumer applications may show the reverse.
- Monthly budgeting: Cumulative daily spend plotted against the monthly budget reveals whether you are on track, trending over, or trending under. CostHawk's burn-rate projection extrapolates current spend patterns to estimate end-of-month totals, giving you 15–20 days of warning before a budget overrun.
Latency → efficiency analysis. Latency logging reveals the correlation between response time and token count — longer responses take longer to generate and cost more. Requests with anomalously high latency relative to their output token count may indicate provider-side issues (queue congestion, rate limiting), retry overhead, or network problems. Requests with anomalously low latency relative to output count are likely cache hits — confirming that your caching strategy is working.
The aggregation pipeline typically runs on a 1-minute to 5-minute cadence for real-time dashboards and on an hourly cadence for detailed analytics. CostHawk performs these aggregations automatically, presenting the results through dashboards, alerts, and API endpoints without requiring you to build or maintain the pipeline.
Log Storage Costs and Retention
LLM metadata logs are compact — a single log entry with 25 fields occupies approximately 500–800 bytes in JSON format, or 200–400 bytes in a columnar format like Parquet. But at scale, storage costs add up and must be managed deliberately.
Sizing your log storage. Here are storage estimates at various request volumes, assuming metadata-only logging (no prompt/response content):
| Daily Requests | Monthly Requests | Monthly Storage (JSON) | Monthly Storage (Parquet) | Annual Storage (Parquet) |
|---|---|---|---|---|
| 1,000 | 30,000 | 18 MB | 9 MB | 108 MB |
| 10,000 | 300,000 | 180 MB | 90 MB | 1.1 GB |
| 100,000 | 3,000,000 | 1.8 GB | 900 MB | 10.8 GB |
| 1,000,000 | 30,000,000 | 18 GB | 9 GB | 108 GB |
At cloud storage rates ($0.023/GB/month for S3 Standard, $0.004/GB/month for S3 Infrequent Access), even 1 million daily requests produce only $0.21/month in hot storage or $0.04/month in cold storage. The storage cost of metadata-only LLM logs is negligible — typically less than 0.01% of the AI API spend they help optimize.
Content logging changes the equation dramatically. If you log full prompts and responses, a single entry might be 5,000–50,000 bytes instead of 500. At 100,000 daily requests with an average of 10 KB per entry, you are generating 30 GB/month in content logs — $0.69/month in S3 Standard, still modest. But content logs require encryption at rest ($0.01/10,000 requests for KMS), access controls, PII scanning ($0.50–$2.00 per GB for DLP services), and GDPR-compliant deletion processes. The operational overhead of managing content logs far exceeds the raw storage cost.
Retention policies. Not all log data has equal value over time. Implement a tiered retention strategy:
- Hot tier (0–30 days): Full-resolution log data in a fast queryable store (PostgreSQL, ClickHouse, BigQuery). This is your debugging and real-time analytics window. Keep every field at per-request granularity.
- Warm tier (30–180 days): Aggregated data — hourly or daily rollups by model, key, project, and environment. Individual request records can be moved to cold storage. This tier supports trend analysis and month-over-month comparisons.
- Cold tier (180 days–2 years): Daily or weekly aggregates in cheap object storage (S3 Glacier, GCS Archive). Useful for annual planning, year-over-year comparisons, and compliance requirements.
- Archive/delete (2+ years): Unless regulatory requirements mandate longer retention, aggregate to monthly summaries and delete individual records. The marginal value of 2-year-old per-request log data is effectively zero for cost optimization purposes.
Choosing a log storage backend. The right storage system depends on your query patterns and volume:
- PostgreSQL / Supabase: Excellent for teams processing up to 1 million requests/day. Familiar SQL query interface, easy to integrate with existing infrastructure, adequate performance for time-range queries with proper indexing. CostHawk uses Supabase PostgreSQL as its primary log store.
- ClickHouse: Purpose-built for analytical queries over time-series data. Handles billions of rows with sub-second query times. Ideal for teams processing 1–100 million requests/day. Higher operational complexity than PostgreSQL.
- BigQuery / Athena: Serverless query engines that can scan Parquet files in object storage. No infrastructure to manage, pay-per-query pricing. Good for infrequent analytical queries over large historical datasets but expensive for real-time dashboards.
- Datadog / Grafana Cloud: Managed observability platforms that ingest and query log data. Convenient but expensive at scale — Datadog charges $0.10 per GB ingested, which for 1 million daily requests (18 GB/month in JSON) adds $1.80/month in log ingestion alone, plus per-query and retention charges.
For most teams, the optimal architecture is PostgreSQL for hot-tier (0–30 days) with time-partitioned tables, S3/GCS for warm/cold tier with Parquet files, and CostHawk as the analytics and visualization layer that queries across both tiers seamlessly.
Structured Logging for LLM Requests
Structured logging — emitting log entries as machine-parseable data structures (JSON, protocol buffers, or typed objects) rather than free-text log lines — is a prerequisite for effective LLM cost analysis. Free-text logs like "INFO: Called gpt-4o, 1847 input tokens, 423 output tokens, 1243ms" are human-readable but require regex parsing to extract fields, are fragile to format changes, and cannot be efficiently queried or aggregated. Structured logs like {"model": "gpt-4o", "input_tokens": 1847, "output_tokens": 423, "latency_ms": 1243} can be directly ingested by analytics pipelines, queried with SQL, and aggregated without parsing.
Implementing structured LLM logging. The cleanest approach is to wrap your LLM API calls in a logging middleware that captures metadata automatically, without requiring each call site to manually construct log entries:
// lib/llm-logger.ts — structured logging middleware
import { logger } from './logger' // Your structured logger (Pino, Winston, etc.)
import { calculateCost } from './pricing'
interface LLMLogEntry {
request_id: string
timestamp: string
provider: string
model: string
input_tokens: number
output_tokens: number
total_tokens: number
cost_usd: number
latency_ms: number
status_code: number
cached: boolean
cache_read_tokens: number
api_key_id: string
project: string
environment: string
feature: string
temperature: number
max_tokens: number
stop_reason: string
tool_calls: number
error_type: string | null
}
export async function callLLMWithLogging(
params: LLMRequestParams,
context: RequestContext
): Promise<LLMResponse> {
const startTime = performance.now()
const requestId = crypto.randomUUID()
try {
const response = await provider.chat.completions.create(params)
const latency = Math.round(performance.now() - startTime)
const logEntry: LLMLogEntry = {
request_id: requestId,
timestamp: new Date().toISOString(),
provider: context.provider,
model: response.model, // Use actual model from response, not requested
input_tokens: response.usage.prompt_tokens,
output_tokens: response.usage.completion_tokens,
total_tokens: response.usage.total_tokens,
cost_usd: calculateCost(response.model, response.usage),
latency_ms: latency,
status_code: 200,
cached: response.usage.prompt_tokens_details?.cached_tokens > 0,
cache_read_tokens: response.usage.prompt_tokens_details?.cached_tokens ?? 0,
api_key_id: context.apiKeyId,
project: context.project,
environment: context.environment,
feature: context.feature,
temperature: params.temperature ?? 1.0,
max_tokens: params.max_tokens ?? -1,
stop_reason: response.choices[0].finish_reason,
tool_calls: response.choices[0].message.tool_calls?.length ?? 0,
error_type: null
}
logger.info(logEntry, 'llm_request_completed')
return response
} catch (error) {
const latency = Math.round(performance.now() - startTime)
logger.error({
request_id: requestId,
timestamp: new Date().toISOString(),
provider: context.provider,
model: params.model,
latency_ms: latency,
status_code: error.status ?? 500,
error_type: error.code ?? error.type ?? 'unknown',
api_key_id: context.apiKeyId,
project: context.project,
environment: context.environment,
feature: context.feature
}, 'llm_request_failed')
throw error
}
}Key implementation decisions:
- Use the model from the response, not the request. When you request
gpt-4o, the API may route to a specific snapshot likegpt-4o-2024-11-20. The response contains the actual model used. Log this — it is essential for cost accuracy when model versions have different pricing. - Compute cost at log time. Do not defer cost calculation to query time. Store the cost in the log entry using the pricing table that was active at the time of the request. This ensures historical cost data remains accurate even when pricing changes.
- Log on both success and failure paths. Failed requests still consume resources (you may be charged for the input tokens), and tracking error rates by model and provider is essential for reliability monitoring.
- Use a high-performance structured logger. Pino (Node.js), structlog (Python), and Zap (Go) are purpose-built for structured logging with minimal overhead. Avoid console.log in production — it is synchronous and slow.
- Batch log writes. Writing each log entry individually to a database adds latency to every LLM request. Instead, buffer entries in memory and flush in batches every 1–5 seconds, or use an async logging transport (Pino's
pino.transport()) that writes to a queue (SQS, Kafka, Redis Stream) for async processing.
Correlation with distributed traces. If your application uses distributed tracing (OpenTelemetry, Datadog APM, Jaeger), include the trace ID and span ID in your LLM log entries. This enables drilling from a slow user request → the API handler span → the specific LLM call that consumed 3 seconds and 4,000 output tokens. Without correlation, you can see that LLM costs are high or latency is bad, but you cannot connect those observations to specific user journeys or code paths.
Logging and CostHawk
CostHawk provides two primary mechanisms for ingesting LLM log data, each designed for different architectural preferences and integration depths.
Mechanism 1: Wrapped API keys (zero-code logging). CostHawk wrapped keys act as a transparent proxy between your application and the LLM provider. You replace your provider API key with a CostHawk wrapped key and point your API base URL to CostHawk's proxy endpoint. Every request flows through CostHawk, which captures metadata in real time and forwards the request to the provider. Your application code does not change — the same SDK, the same parameters, the same error handling. The wrapped key approach provides:
- Automatic metadata capture: Token counts, model, latency, cost, cache status, and error codes are logged without any instrumentation in your code.
- Per-key attribution: Issue separate wrapped keys for different projects, teams, or environments. CostHawk automatically attributes all usage to the correct key.
- Zero prompt content logging: CostHawk's proxy captures metadata only — prompt and response content passes through but is never stored, logged, or inspected.
- Sub-millisecond overhead: The proxy adds less than 5ms of latency to each request — unnoticeable for LLM calls that typically take 500–5,000ms.
Mechanism 2: Provider sync (non-invasive logging). For teams that cannot or prefer not to route traffic through a proxy, CostHawk syncs usage data directly from provider APIs. Connect your OpenAI, Anthropic, or Google Cloud account, and CostHawk pulls usage and billing data on a regular cadence. This approach provides:
- No infrastructure changes: Your API calls go directly to the provider as they do today. CostHawk reads usage data after the fact.
- Provider-reported accuracy: Usage and cost data comes directly from the provider's billing system, eliminating any discrepancy between logged and billed amounts.
- Coarser granularity: Provider APIs typically report usage at the API key and model level, not at the per-request level. You get cost by model and key but lose per-request attribution, latency data, and real-time alerting.
Mechanism 3: MCP integration (for AI development tools). CostHawk's MCP (Model Context Protocol) server integrates directly with AI development environments like Claude Code and OpenAI Codex. The MCP server syncs session-level usage data — tokens consumed, models used, session duration, and cost — from local development tools into CostHawk's dashboard. This provides visibility into the often-overlooked cost of AI-assisted development, which can represent 15–30% of a team's total AI spend.
What CostHawk does with your log data:
- Real-time dashboards: Log data powers the CostHawk dashboard within seconds of ingestion. See current spend, model distribution, and key-level attribution updating in near real time.
- Cost anomaly detection: CostHawk's anomaly detection engine monitors log data for unusual patterns — spending spikes, model distribution shifts, sudden increases in average tokens per request, or new API keys appearing that were not expected. Anomalies trigger alerts via email, Slack, or webhook.
- Budget tracking and alerts: Log data feeds cumulative spend calculations that are compared against budget thresholds. When spend reaches 80%, 90%, or 100% of a configured budget, CostHawk sends alerts with context about what is driving the spend.
- Historical analytics: All log data is retained for trend analysis, month-over-month comparisons, and annual planning. Query historical data through the dashboard or export it via the CostHawk API for custom analysis.
- Savings recommendations: CostHawk analyzes log patterns to identify optimization opportunities — models that could be downgraded, prompts that could leverage caching, development environments using production-tier models, and requests that could use the Batch API for 50% savings.
The net effect is that CostHawk transforms raw LLM log data into a complete cost intelligence layer without requiring you to build logging infrastructure, aggregation pipelines, dashboards, or alerting systems. You provide the data (via wrapped keys, provider sync, or MCP); CostHawk provides the insights.
FAQ
Frequently Asked Questions
Should I log prompt and response content?+
How much does it cost to store LLM logs?+
How do I correlate LLM logs with application logs?+
What is the difference between LLM logging and LLM tracing?+
How long should I retain LLM logs?+
Can logging add latency to my LLM API calls?+
What logging should I add for LLM error handling and retries?+
retry_group_id that links them together. The initial attempt logs the original error (rate limit, timeout, server error), and subsequent retries log their own results. This gives you visibility into retry costs — a request that succeeds on the third try costs 3x the token cost of a first-try success, because you paid for input tokens on all three attempts. Key fields for error and retry logging: error_type (rate_limit, timeout, server_error, invalid_request), retry_count (0 for first attempt, 1 for first retry, etc.), retry_group_id (shared UUID linking all attempts for one logical request), and retry_delay_ms (how long you waited before retrying). Aggregate retry data reveals patterns: if 8% of your Claude API calls are rate-limited and each retry adds 2 seconds of latency plus duplicate input token costs, that is $400–$800/month in wasted spend for a team making 500,000 monthly requests. The fix might be as simple as adding request queuing or reducing concurrency.How do I get started with LLM logging if I have no logging infrastructure?+
Related Terms
Tracing
The practice of recording the full execution path of an LLM request — from prompt construction through model inference to response delivery — with timing and cost attribution at each step. Tracing provides the granular visibility needed to understand where time and money are spent in multi-step AI pipelines.
Read moreLLM Observability
The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that matters: token consumption, cost, latency, error rates, and output quality. LLM observability goes far beyond traditional APM by tracking AI-specific metrics that determine both the reliability and the economics of your AI features.
Read moreDashboards
Visual interfaces for monitoring AI cost, usage, and performance metrics in real-time. The command center for AI cost management — dashboards aggregate token spend, model utilization, latency, and budget health into a single pane of glass.
Read moreCost Anomaly Detection
Automated detection of unusual AI spending patterns — sudden spikes, gradual drift, and per-key anomalies — before they become budget-breaking surprises.
Read moreAlerting
Automated notifications triggered by cost thresholds, usage anomalies, or performance degradation in AI systems. The first line of defense against budget overruns — alerting ensures no cost spike goes unnoticed.
Read moreAI Cost Allocation
The practice of attributing AI API costs to specific teams, projects, features, or customers — enabling accountability, budgeting, and optimization at the organizational level.
Read moreAI Cost Glossary
Put this knowledge to work. Track your AI spend in one place.
CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.
