GlossaryObservabilityUpdated 2026-03-16By Chase Dillingham

Logging

Recording LLM request and response metadata — tokens consumed, model used, latency, cost, and status — for debugging, cost analysis, and compliance. Effective LLM logging captures the operational envelope of every API call without storing sensitive prompt content.

Definition

What is Logging?

LLM logging is the practice of recording structured metadata about every language model API call your application makes. Unlike traditional application logging, which captures stack traces and error messages, LLM logging focuses on the unique attributes of AI API interactions: input and output token counts, the model used, request latency, computed cost, HTTP status code, cache hit/miss status, and request identifiers for correlation. The key distinction is that LLM logging captures the metadata envelope of each request — not the prompt content or model response, which may contain PII, proprietary information, or user data subject to privacy regulations.

A single LLM log entry typically contains 15–25 fields that collectively answer the questions: What model did we call? How many tokens did we send and receive? How long did it take? How much did it cost? Did it succeed? And how does this request relate to the broader user session or feature that triggered it? Aggregated across thousands or millions of requests, these log entries become the raw material for cost analysis, performance monitoring, anomaly detection, capacity planning, and compliance reporting.

For AI cost management, logging is the foundational data layer. You cannot optimize what you cannot see, and you cannot see without logs. Every chart in a cost dashboard, every alert that fires on a budget threshold, every anomaly detection algorithm that spots unusual spending — all of these consume log data as their primary input. Without comprehensive LLM logging, cost management degenerates into checking monthly invoices after the fact, with no ability to attribute, diagnose, or prevent overspend.

Impact

Why It Matters for AI Costs

LLM API costs are invisible without logging. Provider invoices arrive monthly and show aggregate spend — useful for accounting, useless for engineering. Logging transforms a single monthly number into a rich, queryable dataset that reveals exactly where every dollar goes and why.

Consider a team spending $35,000/month on AI APIs. Without logging, they know they spent $35,000. With logging, they know:

$14,200 (41%) went to their customer support agent, which makes 8 tool calls per session at an average cost of $0.23 per session
$9,800 (28%) went to document summarization, where 12% of requests are retries triggered by timeout errors on long documents
$6,100 (17%) went to their development environment, where engineers are testing prompts against GPT-4o instead of GPT-4o mini
$3,400 (10%) went to a deprecated feature that was supposed to be decommissioned two months ago
$1,500 (4%) went to a single API key that was accidentally left in a CI/CD pipeline, running integration tests on every commit

Each of those insights is actionable. Fixing the retry logic saves $1,176/month. Switching dev to GPT-4o mini saves $5,700/month. Decommissioning the deprecated feature saves $3,400/month. Removing the CI/CD key saves $1,500/month. Total savings from logging visibility: $11,776/month — a 34% cost reduction from insights that are impossible without per-request log data.

Beyond cost optimization, LLM logging supports regulatory compliance (audit trails for AI decisions), incident response (tracing a bad output back to its model and prompt configuration), capacity planning (predicting future API spend based on traffic trends), and vendor negotiation (showing actual usage patterns when negotiating enterprise pricing or committed-use discounts with providers). CostHawk ingests LLM log data from wrapped API keys and provider sync integrations, converting raw logs into actionable cost intelligence without requiring you to build the analytics pipeline yourself.

What is LLM Logging?

LLM logging is the systematic capture and storage of structured metadata for every language model API call. It extends the concept of traditional application logging — recording what happened, when, and whether it succeeded — into the AI-specific domain where the key metrics are tokens, models, costs, and latencies rather than HTTP methods, endpoints, and response codes.

An LLM log entry is fundamentally different from a traditional API log entry because the economics are different. A traditional REST API call has negligible marginal cost — the server CPU time for one more JSON response is fractions of a cent. An LLM API call has significant and variable marginal cost — a single request can cost anywhere from $0.0001 (a short GPT-4o mini completion) to $2.00+ (a long Claude 3 Opus completion with a full context window). This cost variability means that LLM logs must capture the inputs to the cost calculation (token counts, model identifier) with the same rigor that financial systems capture transaction amounts.

A well-designed LLM log entry captures data at three levels:

Level 1: Request metadata (always logged)

Field	Type	Example	Purpose
`request_id`	UUID	`550e8400-e29b-41d4-a716-446655440000`	Unique identifier for correlation
`timestamp`	ISO 8601	`2026-03-16T14:23:07.412Z`	When the request was made
`model`	string	`gpt-4o-2024-11-20`	Exact model version (not just family)
`provider`	string	`openai`	API provider
`input_tokens`	integer	`1847`	Tokens in the prompt
`output_tokens`	integer	`423`	Tokens in the completion
`total_tokens`	integer	`2270`	Sum of input + output
`cost_usd`	decimal	`0.00886`	Computed cost based on model pricing
`latency_ms`	integer	`1243`	End-to-end request duration
`status_code`	integer	`200`	HTTP response status
`cached`	boolean	`true`	Whether prompt caching was used
`cache_read_tokens`	integer	`1200`	Tokens served from cache (discounted)

Level 2: Attribution metadata (recommended)

Field	Type	Example	Purpose
`api_key_id`	string	`key_prod_support_agent`	Which key made the request
`project`	string	`customer-support`	Project or service name
`environment`	string	`production`	prod / staging / dev
`feature`	string	`ticket-summarizer`	Specific feature or endpoint
`user_id`	string	`usr_abc123`	End user (hashed or anonymized)
`session_id`	string	`sess_def456`	User session for multi-turn tracking
`trace_id`	string	`trace_ghi789`	Distributed trace correlation ID

Level 3: Diagnostic metadata (optional, for debugging)

Field	Type	Example	Purpose
`temperature`	float	`0.3`	Generation temperature used
`max_tokens`	integer	`1024`	Max output token setting
`stop_reason`	string	`end_turn`	Why generation stopped
`tool_calls`	integer	`3`	Number of tool/function calls
`retry_count`	integer	`0`	How many retries occurred
`time_to_first_token_ms`	integer	`287`	Streaming TTFT latency

Notice what is absent from all three levels: prompt content and response content. This is deliberate and critical for security, privacy, and compliance, as discussed in the next section.

What to Log (and What NOT to Log)

The most important decision in LLM logging is what to exclude. Prompts and completions frequently contain personally identifiable information (PII), proprietary business data, user-generated content, health information, financial details, and other sensitive material. Logging this content creates security liabilities, regulatory exposure, and storage costs that far outweigh the debugging value.

What you SHOULD log:

All fields in Level 1 and Level 2 above. Token counts, model identifiers, costs, latencies, and attribution tags are the core data needed for cost analysis, performance monitoring, and compliance reporting. None of these contain sensitive content.
Error codes and error types (but not full error messages, which may echo back prompt content). Log "error_type": "rate_limit_exceeded", not the full API response body.
Request configuration parameters (temperature, max_tokens, top_p, stop sequences). These are non-sensitive and essential for debugging quality issues — if outputs suddenly degrade, knowing that someone changed the temperature from 0.3 to 1.0 is invaluable.
Hashed or truncated identifiers. If you need to correlate logs with specific users, use one-way hashed user IDs rather than raw identifiers. A SHA-256 hash of the user ID provides correlation capability without exposing the actual identity in logs.

What you should NOT log by default:

Prompt content (system messages, user messages, conversation history). This is the most sensitive data in the LLM pipeline. System prompts may contain proprietary business logic and instructions. User messages may contain PII, health information, financial data, or any other sensitive content your users share with your application. Logging these creates a honeypot — a centralized store of sensitive data that must be encrypted, access-controlled, retention-managed, and potentially subject to GDPR right-to-erasure requests.
Model response content. The model's output may reflect, summarize, or rephrase the user's sensitive input. It may also contain hallucinated PII (names, addresses, phone numbers that the model generates). Logging responses carries the same risks as logging prompts.
Full API request/response bodies. These contain prompt and completion content along with metadata. Log the metadata; discard the rest.
API keys or authentication tokens. Never log credentials, even in redacted form. Log the key identifier or a hashed representation, not the key itself.

When you might choose to log content (with safeguards):

Some use cases genuinely benefit from content logging — debugging prompt regressions, training data collection, compliance audit trails for regulated industries. If you must log content, implement these safeguards:

Separate storage: Store content logs in a different system from metadata logs, with stricter access controls, encryption at rest and in transit, and shorter retention periods.
PII detection and redaction: Run content through a PII detection pipeline (Amazon Comprehend, Google DLP, Presidio) before logging. Redact or mask detected PII: "My SSN is [REDACTED]".
Consent and legal review: Ensure your privacy policy covers LLM content logging and that you have legal basis (consent, legitimate interest, or contractual necessity) under applicable regulations (GDPR, CCPA, HIPAA).
Retention limits: Set aggressive retention policies — 7–30 days for debugging content, with automatic deletion. Metadata logs can be retained for 12–24 months for trend analysis.
Access audit: Log who accesses content logs and why. Restrict access to a minimal set of authorized personnel.

CostHawk's logging architecture follows the metadata-only approach by default. When you route API calls through CostHawk wrapped keys, CostHawk captures token counts, model identifiers, latency, cost, and attribution tags — never prompt or response content. This design eliminates PII concerns, reduces storage costs, and simplifies compliance, while providing all the data needed for comprehensive cost analysis.

Logging for Cost Analysis

The primary consumer of LLM log data in most organizations is the cost analytics pipeline. Raw logs are transformed into cost insights through aggregation, attribution, and trend analysis. Here is how each log field contributes to cost visibility:

Token counts → cost computation. The most fundamental cost calculation multiplies token counts by per-token rates:

cost = (input_tokens × input_rate) + (output_tokens × output_rate)

// With prompt caching:
cost = ((input_tokens - cache_read_tokens) × input_rate)
     + (cache_read_tokens × cached_input_rate)
     + (output_tokens × output_rate)

This computation must be performed at log ingestion time, not query time, because pricing rates change over time. If you compute cost at query time using current rates, historical costs will be wrong. Store the computed cost_usd with each log entry, along with the rate table version used for computation. CostHawk maintains a continuously updated pricing database for every model from every provider, ensuring cost computation is accurate even when rates change.

Model field → cost attribution by model. Grouping logs by model reveals which models drive the most spend. A common finding: teams discover that 15–30% of their spend goes to a frontier model for requests that a model 10x cheaper could handle equally well. Without per-model logging, this insight is invisible — the monthly invoice shows total OpenAI spend but not which models were called or how frequently.

API key / project / feature → cost attribution by owner. Attribution fields enable chargeback and accountability. When the data platform team can see that their embedding pipeline costs $4,200/month and the customer support team's agent costs $8,700/month, each team has both the visibility and the incentive to optimize their own usage. Without attribution, cost optimization is everyone's problem and therefore no one's problem.

Environment → dev/staging/prod cost split. One of the most common cost surprises is development environment spending. Engineers testing prompts, running experiments, and debugging issues often use production-tier models without realizing the cost. Logging with environment tags reveals the split. Industry data suggests that 20–40% of AI API spend goes to non-production environments — a number that shocks most engineering leaders when they first see it. Simply switching development environments to cheaper models can save 10–25% of total spend overnight.

Timestamp → temporal analysis. Time-series aggregation reveals spending patterns that point-in-time numbers miss:

Hourly patterns: Peak spending during business hours, overnight batch processing spikes, weekend dips. Understanding these patterns enables capacity planning and rate limit management.
Daily trends: Gradually increasing daily spend may indicate growing traffic (expected) or prompt drift/context accumulation (unexpected). A sudden step-function increase usually corresponds to a deployment.
Weekly cycles: B2B applications often show 5x weekday vs. weekend spend differences. Consumer applications may show the reverse.
Monthly budgeting: Cumulative daily spend plotted against the monthly budget reveals whether you are on track, trending over, or trending under. CostHawk's burn-rate projection extrapolates current spend patterns to estimate end-of-month totals, giving you 15–20 days of warning before a budget overrun.

Latency → efficiency analysis. Latency logging reveals the correlation between response time and token count — longer responses take longer to generate and cost more. Requests with anomalously high latency relative to their output token count may indicate provider-side issues (queue congestion, rate limiting), retry overhead, or network problems. Requests with anomalously low latency relative to output count are likely cache hits — confirming that your caching strategy is working.

The aggregation pipeline typically runs on a 1-minute to 5-minute cadence for real-time dashboards and on an hourly cadence for detailed analytics. CostHawk performs these aggregations automatically, presenting the results through dashboards, alerts, and API endpoints without requiring you to build or maintain the pipeline.

Log Storage Costs and Retention

LLM metadata logs are compact — a single log entry with 25 fields occupies approximately 500–800 bytes in JSON format, or 200–400 bytes in a columnar format like Parquet. But at scale, storage costs add up and must be managed deliberately.

Sizing your log storage. Here are storage estimates at various request volumes, assuming metadata-only logging (no prompt/response content):

Daily Requests	Monthly Requests	Monthly Storage (JSON)	Monthly Storage (Parquet)	Annual Storage (Parquet)
1,000	30,000	18 MB	9 MB	108 MB
10,000	300,000	180 MB	90 MB	1.1 GB
100,000	3,000,000	1.8 GB	900 MB	10.8 GB
1,000,000	30,000,000	18 GB	9 GB	108 GB

At cloud storage rates ($0.023/GB/month for S3 Standard, $0.004/GB/month for S3 Infrequent Access), even 1 million daily requests produce only $0.21/month in hot storage or $0.04/month in cold storage. The storage cost of metadata-only LLM logs is negligible — typically less than 0.01% of the AI API spend they help optimize.

Content logging changes the equation dramatically. If you log full prompts and responses, a single entry might be 5,000–50,000 bytes instead of 500. At 100,000 daily requests with an average of 10 KB per entry, you are generating 30 GB/month in content logs — $0.69/month in S3 Standard, still modest. But content logs require encryption at rest ($0.01/10,000 requests for KMS), access controls, PII scanning ($0.50–$2.00 per GB for DLP services), and GDPR-compliant deletion processes. The operational overhead of managing content logs far exceeds the raw storage cost.

Retention policies. Not all log data has equal value over time. Implement a tiered retention strategy:

Hot tier (0–30 days): Full-resolution log data in a fast queryable store (PostgreSQL, ClickHouse, BigQuery). This is your debugging and real-time analytics window. Keep every field at per-request granularity.
Warm tier (30–180 days): Aggregated data — hourly or daily rollups by model, key, project, and environment. Individual request records can be moved to cold storage. This tier supports trend analysis and month-over-month comparisons.
Cold tier (180 days–2 years): Daily or weekly aggregates in cheap object storage (S3 Glacier, GCS Archive). Useful for annual planning, year-over-year comparisons, and compliance requirements.
Archive/delete (2+ years): Unless regulatory requirements mandate longer retention, aggregate to monthly summaries and delete individual records. The marginal value of 2-year-old per-request log data is effectively zero for cost optimization purposes.

Choosing a log storage backend. The right storage system depends on your query patterns and volume:

PostgreSQL / Supabase: Excellent for teams processing up to 1 million requests/day. Familiar SQL query interface, easy to integrate with existing infrastructure, adequate performance for time-range queries with proper indexing. CostHawk uses Supabase PostgreSQL as its primary log store.
ClickHouse: Purpose-built for analytical queries over time-series data. Handles billions of rows with sub-second query times. Ideal for teams processing 1–100 million requests/day. Higher operational complexity than PostgreSQL.
BigQuery / Athena: Serverless query engines that can scan Parquet files in object storage. No infrastructure to manage, pay-per-query pricing. Good for infrequent analytical queries over large historical datasets but expensive for real-time dashboards.
Datadog / Grafana Cloud: Managed observability platforms that ingest and query log data. Convenient but expensive at scale — Datadog charges $0.10 per GB ingested, which for 1 million daily requests (18 GB/month in JSON) adds $1.80/month in log ingestion alone, plus per-query and retention charges.

For most teams, the optimal architecture is PostgreSQL for hot-tier (0–30 days) with time-partitioned tables, S3/GCS for warm/cold tier with Parquet files, and CostHawk as the analytics and visualization layer that queries across both tiers seamlessly.

Structured Logging for LLM Requests

Structured logging — emitting log entries as machine-parseable data structures (JSON, protocol buffers, or typed objects) rather than free-text log lines — is a prerequisite for effective LLM cost analysis. Free-text logs like "INFO: Called gpt-4o, 1847 input tokens, 423 output tokens, 1243ms" are human-readable but require regex parsing to extract fields, are fragile to format changes, and cannot be efficiently queried or aggregated. Structured logs like {"model": "gpt-4o", "input_tokens": 1847, "output_tokens": 423, "latency_ms": 1243} can be directly ingested by analytics pipelines, queried with SQL, and aggregated without parsing.

Implementing structured LLM logging. The cleanest approach is to wrap your LLM API calls in a logging middleware that captures metadata automatically, without requiring each call site to manually construct log entries:

// lib/llm-logger.ts — structured logging middleware
import { logger } from './logger'  // Your structured logger (Pino, Winston, etc.)
import { calculateCost } from './pricing'

interface LLMLogEntry {
  request_id: string
  timestamp: string
  provider: string
  model: string
  input_tokens: number
  output_tokens: number
  total_tokens: number
  cost_usd: number
  latency_ms: number
  status_code: number
  cached: boolean
  cache_read_tokens: number
  api_key_id: string
  project: string
  environment: string
  feature: string
  temperature: number
  max_tokens: number
  stop_reason: string
  tool_calls: number
  error_type: string | null
}

export async function callLLMWithLogging(
  params: LLMRequestParams,
  context: RequestContext
): Promise<LLMResponse> {
  const startTime = performance.now()
  const requestId = crypto.randomUUID()
  
  try {
    const response = await provider.chat.completions.create(params)
    const latency = Math.round(performance.now() - startTime)
    
    const logEntry: LLMLogEntry = {
      request_id: requestId,
      timestamp: new Date().toISOString(),
      provider: context.provider,
      model: response.model,  // Use actual model from response, not requested
      input_tokens: response.usage.prompt_tokens,
      output_tokens: response.usage.completion_tokens,
      total_tokens: response.usage.total_tokens,
      cost_usd: calculateCost(response.model, response.usage),
      latency_ms: latency,
      status_code: 200,
      cached: response.usage.prompt_tokens_details?.cached_tokens > 0,
      cache_read_tokens: response.usage.prompt_tokens_details?.cached_tokens ?? 0,
      api_key_id: context.apiKeyId,
      project: context.project,
      environment: context.environment,
      feature: context.feature,
      temperature: params.temperature ?? 1.0,
      max_tokens: params.max_tokens ?? -1,
      stop_reason: response.choices[0].finish_reason,
      tool_calls: response.choices[0].message.tool_calls?.length ?? 0,
      error_type: null
    }
    
    logger.info(logEntry, 'llm_request_completed')
    return response
    
  } catch (error) {
    const latency = Math.round(performance.now() - startTime)
    
    logger.error({
      request_id: requestId,
      timestamp: new Date().toISOString(),
      provider: context.provider,
      model: params.model,
      latency_ms: latency,
      status_code: error.status ?? 500,
      error_type: error.code ?? error.type ?? 'unknown',
      api_key_id: context.apiKeyId,
      project: context.project,
      environment: context.environment,
      feature: context.feature
    }, 'llm_request_failed')
    
    throw error
  }
}

Key implementation decisions:

Use the model from the response, not the request. When you request gpt-4o, the API may route to a specific snapshot like gpt-4o-2024-11-20. The response contains the actual model used. Log this — it is essential for cost accuracy when model versions have different pricing.
Compute cost at log time. Do not defer cost calculation to query time. Store the cost in the log entry using the pricing table that was active at the time of the request. This ensures historical cost data remains accurate even when pricing changes.
Log on both success and failure paths. Failed requests still consume resources (you may be charged for the input tokens), and tracking error rates by model and provider is essential for reliability monitoring.
Use a high-performance structured logger. Pino (Node.js), structlog (Python), and Zap (Go) are purpose-built for structured logging with minimal overhead. Avoid console.log in production — it is synchronous and slow.
Batch log writes. Writing each log entry individually to a database adds latency to every LLM request. Instead, buffer entries in memory and flush in batches every 1–5 seconds, or use an async logging transport (Pino's pino.transport()) that writes to a queue (SQS, Kafka, Redis Stream) for async processing.

Correlation with distributed traces. If your application uses distributed tracing (OpenTelemetry, Datadog APM, Jaeger), include the trace ID and span ID in your LLM log entries. This enables drilling from a slow user request → the API handler span → the specific LLM call that consumed 3 seconds and 4,000 output tokens. Without correlation, you can see that LLM costs are high or latency is bad, but you cannot connect those observations to specific user journeys or code paths.

Logging and CostHawk

CostHawk provides two primary mechanisms for ingesting LLM log data, each designed for different architectural preferences and integration depths.

Mechanism 1: Wrapped API keys (zero-code logging). CostHawk wrapped keys act as a transparent proxy between your application and the LLM provider. You replace your provider API key with a CostHawk wrapped key and point your API base URL to CostHawk's proxy endpoint. Every request flows through CostHawk, which captures metadata in real time and forwards the request to the provider. Your application code does not change — the same SDK, the same parameters, the same error handling. The wrapped key approach provides:

Automatic metadata capture: Token counts, model, latency, cost, cache status, and error codes are logged without any instrumentation in your code.
Per-key attribution: Issue separate wrapped keys for different projects, teams, or environments. CostHawk automatically attributes all usage to the correct key.
Zero prompt content logging: CostHawk's proxy captures metadata only — prompt and response content passes through but is never stored, logged, or inspected.
Sub-millisecond overhead: The proxy adds less than 5ms of latency to each request — unnoticeable for LLM calls that typically take 500–5,000ms.

Mechanism 2: Provider sync (non-invasive logging). For teams that cannot or prefer not to route traffic through a proxy, CostHawk syncs usage data directly from provider APIs. Connect your OpenAI, Anthropic, or Google Cloud account, and CostHawk pulls usage and billing data on a regular cadence. This approach provides:

No infrastructure changes: Your API calls go directly to the provider as they do today. CostHawk reads usage data after the fact.
Provider-reported accuracy: Usage and cost data comes directly from the provider's billing system, eliminating any discrepancy between logged and billed amounts.
Coarser granularity: Provider APIs typically report usage at the API key and model level, not at the per-request level. You get cost by model and key but lose per-request attribution, latency data, and real-time alerting.

Mechanism 3: MCP integration (for AI development tools). CostHawk's MCP (Model Context Protocol) server integrates directly with AI development environments like Claude Code and OpenAI Codex. The MCP server syncs session-level usage data — tokens consumed, models used, session duration, and cost — from local development tools into CostHawk's dashboard. This provides visibility into the often-overlooked cost of AI-assisted development, which can represent 15–30% of a team's total AI spend.

What CostHawk does with your log data:

Real-time dashboards: Log data powers the CostHawk dashboard within seconds of ingestion. See current spend, model distribution, and key-level attribution updating in near real time.
Cost anomaly detection: CostHawk's anomaly detection engine monitors log data for unusual patterns — spending spikes, model distribution shifts, sudden increases in average tokens per request, or new API keys appearing that were not expected. Anomalies trigger alerts via email, Slack, or webhook.
Budget tracking and alerts: Log data feeds cumulative spend calculations that are compared against budget thresholds. When spend reaches 80%, 90%, or 100% of a configured budget, CostHawk sends alerts with context about what is driving the spend.
Historical analytics: All log data is retained for trend analysis, month-over-month comparisons, and annual planning. Query historical data through the dashboard or export it via the CostHawk API for custom analysis.
Savings recommendations: CostHawk analyzes log patterns to identify optimization opportunities — models that could be downgraded, prompts that could leverage caching, development environments using production-tier models, and requests that could use the Batch API for 50% savings.

The net effect is that CostHawk transforms raw LLM log data into a complete cost intelligence layer without requiring you to build logging infrastructure, aggregation pipelines, dashboards, or alerting systems. You provide the data (via wrapped keys, provider sync, or MCP); CostHawk provides the insights.

FAQ

Frequently Asked Questions

Should I log prompt and response content?+

For most applications, no. Prompt and response content frequently contains PII, proprietary data, health information, financial details, and other sensitive material that creates security and compliance liabilities when stored in logs. Metadata-only logging (token counts, model, latency, cost, attribution tags) provides all the data needed for cost analysis, performance monitoring, and capacity planning without any privacy risk. The exceptions are applications where content logging is explicitly required — regulated industries that need audit trails of AI decisions, teams building training datasets from production data, or debugging scenarios where you need to understand why a specific response was wrong. In these cases, implement content logging with strict safeguards: separate encrypted storage, PII detection and redaction before storage, short retention periods (7–30 days), access controls with audit trails, and legal review to ensure compliance with GDPR, CCPA, HIPAA, or other applicable regulations. CostHawk's wrapped keys capture metadata only by design — prompt and response content passes through the proxy but is never stored or inspected.

How much does it cost to store LLM logs?+

Metadata-only LLM logs are surprisingly cheap to store. A single log entry with 25 fields occupies approximately 500–800 bytes in JSON or 200–400 bytes in columnar formats like Parquet. At 100,000 requests per day (a moderately busy production application), you generate roughly 1.8 GB of JSON logs per month or 900 MB in Parquet. Cloud storage costs for this volume are negligible: $0.04/month in S3 Standard, $0.007/month in S3 Infrequent Access. Even at 1 million requests per day, storage costs are under $0.50/month for warm storage. The real cost driver is the analytics infrastructure — running a PostgreSQL database, ClickHouse cluster, or BigQuery queries against the log data. For a PostgreSQL instance adequate for 10 million log entries with fast time-range queries, expect $25–$100/month on managed services. ClickHouse offers better analytical performance at $50–$200/month managed. CostHawk includes log storage and analytics in its platform pricing, eliminating the need to provision and manage your own log storage infrastructure.

How do I correlate LLM logs with application logs?+

The key is propagating a shared correlation identifier — typically a request ID or trace ID — through both your application logging pipeline and your LLM logging pipeline. When your application handles a user request, generate a unique ID (UUID or ULID) at the entry point. Pass this ID through your middleware stack, include it in LLM API call metadata (most providers accept custom headers or request metadata), and log it alongside every LLM log entry. When debugging an issue, search for the correlation ID in both log systems to see the complete picture: the user request that triggered the LLM call, the LLM call's token count and cost, and the downstream processing of the model's response. If you use OpenTelemetry for distributed tracing, instrument your LLM calls as spans within the parent trace — this automatically provides correlation through trace and span IDs. CostHawk's wrapped keys preserve custom headers, so correlation IDs set by your application are available in CostHawk's log data for cross-referencing.

What is the difference between LLM logging and LLM tracing?+

Logging and tracing are complementary observability patterns that capture different dimensions of LLM behavior. Logging records individual events — each LLM API call produces one log entry with metadata about that specific call (tokens, model, cost, latency). Logs are flat: each entry is independent and self-contained. Tracing captures the causal chain of operations that compose a user-visible action. A single user request might trigger a retrieval step, two LLM calls (one for analysis, one for summarization), a tool call, and a final response assembly. A trace connects all of these operations into a tree structure with parent-child relationships, showing how they relate and where time was spent. Logging tells you that your GPT-4o call consumed 2,400 tokens and cost $0.03. Tracing tells you that call was the second of three LLM calls in an agentic loop that collectively consumed 8,200 tokens and cost $0.11 — and that 60% of the total latency was spent in the retrieval step, not the LLM calls. For cost analysis, logging is usually sufficient. For debugging complex multi-step workflows (RAG pipelines, agents with tool use, chain-of-thought decompositions), tracing provides essential structural context that flat logs cannot.

How long should I retain LLM logs?+

Implement a tiered retention strategy based on data resolution and value over time. For the hot tier (0–30 days), retain full per-request log entries in a fast queryable database — this window supports real-time debugging, incident investigation, and detailed analytics. For the warm tier (30–180 days), retain hourly or daily aggregates (total tokens, total cost, request count, average latency — grouped by model, key, project, and environment). Individual request records can be moved to cold storage or deleted. This tier supports trend analysis and month-over-month comparisons. For the cold tier (180 days to 2 years), retain daily or weekly aggregates in cheap object storage. This tier supports annual planning and year-over-year comparisons. Beyond 2 years, retain only monthly summaries unless regulatory requirements mandate otherwise. This tiered approach balances query performance, storage costs, and analytical utility. The vast majority of log queries target the last 7–30 days, so investing in fast hot-tier storage while aggressively compressing older data is the right tradeoff. CostHawk manages log retention and tiering automatically, compressing historical data while maintaining queryability for trend analysis.

Can logging add latency to my LLM API calls?+

If implemented correctly, logging adds negligible latency — less than 1 millisecond per request. The key is to never perform synchronous I/O (database writes, network calls) in the request path. Instead, construct the log entry in memory (sub-microsecond), push it to an in-process buffer or async queue (sub-millisecond), and return the response to the user immediately. A background process drains the buffer and writes log entries to storage in batches. In Node.js, libraries like Pino write to stdout asynchronously and a transport process handles the actual storage — the logging call itself takes ~50 microseconds. In Python, structlog with async handlers achieves similar performance. The only scenario where logging adds noticeable latency is if you make synchronous database writes for every log entry — a single PostgreSQL INSERT takes 1–5ms, which is acceptable for LLM calls (500–5,000ms typical) but adds up across many calls. CostHawk's wrapped key proxy adds less than 5ms of total round-trip latency — the time to capture metadata and forward the request — which is imperceptible relative to typical LLM response times.

What logging should I add for LLM error handling and retries?+

Log every attempt, not just the final result. When an LLM call fails and your application retries, each attempt should produce its own log entry with a shared retry_group_id that links them together. The initial attempt logs the original error (rate limit, timeout, server error), and subsequent retries log their own results. This gives you visibility into retry costs — a request that succeeds on the third try costs 3x the token cost of a first-try success, because you paid for input tokens on all three attempts. Key fields for error and retry logging: error_type (rate_limit, timeout, server_error, invalid_request), retry_count (0 for first attempt, 1 for first retry, etc.), retry_group_id (shared UUID linking all attempts for one logical request), and retry_delay_ms (how long you waited before retrying). Aggregate retry data reveals patterns: if 8% of your Claude API calls are rate-limited and each retry adds 2 seconds of latency plus duplicate input token costs, that is $400–$800/month in wasted spend for a team making 500,000 monthly requests. The fix might be as simple as adding request queuing or reducing concurrency.

How do I get started with LLM logging if I have no logging infrastructure?+

The fastest path is to use CostHawk's wrapped API keys, which provide comprehensive LLM logging with zero infrastructure setup. Replace your OpenAI or Anthropic API key with a CostHawk wrapped key, update your base URL, and every request is automatically logged with token counts, model, latency, cost, and key-level attribution. You get a dashboard, alerts, and analytics out of the box. If you prefer to build your own logging, start minimal: add a structured logging middleware (the code pattern in the Structured Logging section of this page) that writes JSON log entries to stdout. Use your existing log aggregation pipeline (CloudWatch, Datadog, Grafana Loki) to collect and query them. This gets you per-request visibility within an afternoon of work. Then progressively enhance: add a PostgreSQL table for log entries so you can run SQL analytics, add attribution tags (project, feature, environment) to each log entry, and build a simple daily cost report query. Within a week of incremental work, you will have basic cost visibility. Within a month, you will have enough historical data to identify optimization opportunities. Most teams find that the first month of log data reveals $1,000–$5,000/month in easy savings — more than paying for the engineering time invested in setting up logging.

Related Terms

Tracing

The practice of recording the full execution path of an LLM request — from prompt construction through model inference to response delivery — with timing and cost attribution at each step. Tracing provides the granular visibility needed to understand where time and money are spent in multi-step AI pipelines.

LLM Observability

The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that matters: token consumption, cost, latency, error rates, and output quality. LLM observability goes far beyond traditional APM by tracking AI-specific metrics that determine both the reliability and the economics of your AI features.

Put this knowledge to work. Track your AI spend in one place.

CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.

Get started free Back to Glossary