GlossaryObservabilityUpdated 2026-03-16

OpenTelemetry

An open-source observability framework providing a vendor-neutral standard (OTLP) for collecting traces, metrics, and logs from distributed systems. OpenTelemetry is rapidly becoming the standard instrumentation layer for LLM applications, enabling teams to track latency, token usage, cost, and quality across every inference call.

Definition

What is OpenTelemetry?

OpenTelemetry (OTel) is an open-source, vendor-neutral observability framework maintained by the Cloud Native Computing Foundation (CNCF). It provides a unified set of APIs, SDKs, and tools for generating, collecting, and exporting traces, metrics, and logs — the three pillars of observability. OpenTelemetry defines the OpenTelemetry Protocol (OTLP), a standardized wire format for telemetry data that is supported by virtually every modern observability backend: Datadog, Grafana, Honeycomb, New Relic, Jaeger, Prometheus, and dozens more. For AI and LLM applications, OpenTelemetry is emerging as the standard instrumentation layer. The Semantic Conventions for Generative AI (the gen_ai.* attribute namespace) define how to record model name, token counts, cost, latency, and other LLM-specific metadata in a consistent, interoperable format. By instrumenting your LLM calls with OpenTelemetry, you create a telemetry pipeline that feeds any backend — including cost monitoring platforms like CostHawk — without vendor lock-in.

Impact

Why It Matters for AI Costs

LLM applications introduce observability challenges that traditional APM tools were not designed to handle. A single LLM API call is not just an HTTP request — it carries model selection, token consumption, cost, prompt content, response quality, and latency characteristics that all need to be captured, correlated, and analyzed. Without standardized instrumentation, teams end up with fragmented visibility:

  • Latency data in Datadog, but no token counts
  • Cost data in the provider dashboard, but no correlation to application features
  • Quality evaluations in a spreadsheet, but no connection to the traces that generated them

OpenTelemetry solves this fragmentation by providing a single instrumentation layer that captures all dimensions of an LLM call and exports them to any backend. The practical benefits are substantial:

Vendor neutrality: Instrument once, export everywhere. If you switch from Datadog to Grafana, or add CostHawk as an additional backend, you change a configuration line — not your application code. The OTel SDK supports multiple exporters simultaneously, so you can send traces to Jaeger for debugging, metrics to Prometheus for alerting, and cost telemetry to CostHawk for budget tracking — all from the same instrumentation.

Correlation: OTel traces connect LLM calls to the application context that triggered them. A single trace can span a user request → application logic → LLM API call → response processing → database write, giving you end-to-end visibility into how LLM calls fit into your application's behavior. When a cost anomaly occurs, you can trace it back to the specific feature, user, or code path that generated the expensive calls.

Standardization: The gen_ai.* semantic conventions mean that every LLM instrumentation library records the same attributes in the same format. Whether you use the OpenLLMetry library, the Traceloop SDK, or custom instrumentation, the data is interoperable. This standardization is critical for building tooling, dashboards, and alerts that work across any LLM provider or framework.

For cost management specifically, OpenTelemetry provides the telemetry pipeline that connects your application code to CostHawk. Instead of relying solely on provider billing dashboards (which show aggregate spend with no application context), OTel-instrumented applications emit per-request cost data that CostHawk can attribute to projects, features, users, and teams.

What is OpenTelemetry?

OpenTelemetry is the merger of two earlier CNCF projects — OpenTracing and OpenCensus — unified in 2019 to create a single, definitive observability standard. It reached general availability for traces in 2023 and has since become the second most active CNCF project after Kubernetes, with contributions from Google, Microsoft, Amazon, Splunk, Datadog, and hundreds of other organizations.

OpenTelemetry provides four core components:

  1. API: A set of interfaces for creating telemetry in your application code. The API is intentionally thin — calling it with no SDK configured produces no overhead (a "no-op" implementation). This means library authors can instrument their code with the OTel API without forcing users to adopt any specific observability stack.
  2. SDK: The implementation that processes telemetry created via the API. The SDK handles sampling, batching, and exporting. You configure it at application startup with your desired exporters, sampling rate, and resource attributes.
  3. OTLP (OpenTelemetry Protocol): A standardized wire protocol for transmitting telemetry data over gRPC or HTTP. OTLP is supported natively by all major observability platforms, eliminating the need for vendor-specific exporters in most cases.
  4. Collector: An optional standalone service that receives telemetry data, processes it (filtering, sampling, enriching, transforming), and exports it to one or more backends. The Collector is deployed as a sidecar, DaemonSet, or standalone service and acts as a telemetry pipeline between your applications and your observability backends.

The three signal types that OpenTelemetry supports are:

  • Traces: Distributed traces that follow a request through multiple services and operations. Each trace consists of spans — units of work with a start time, duration, status, and attributes. For LLM applications, a span typically represents one inference call.
  • Metrics: Numerical measurements collected over time — counters (total tokens consumed), histograms (latency distribution), and gauges (current active requests). Metrics are lower-cardinality than traces and are ideal for dashboards and alerting.
  • Logs: Structured log records that can be correlated with traces via trace IDs and span IDs. OTel logs provide the narrative context that explains what happened during a trace.

For LLM applications, all three signals are relevant: traces capture per-request detail (model, tokens, cost, latency), metrics capture aggregate trends (total spend, average latency, error rate), and logs capture prompt/response content and evaluation results. Together, they provide complete observability for AI-powered applications.

OTel for LLM Applications

Instrumenting LLM API calls with OpenTelemetry involves creating spans that capture the full context of each inference request. The instrumentation can be done manually, using auto-instrumentation libraries, or through LLM-specific OTel libraries like OpenLLMetry and Traceloop.

Manual instrumentation example (TypeScript with OpenAI):

import { trace, SpanKind, SpanStatusCode } from '@opentelemetry/api'
import OpenAI from 'openai'

const tracer = trace.getTracer('llm-service', '1.0.0')
const openai = new OpenAI()

async function chatCompletion(prompt: string) {
  return tracer.startActiveSpan('gen_ai.chat', {
    kind: SpanKind.CLIENT,
    attributes: {
      'gen_ai.system': 'openai',
      'gen_ai.request.model': 'gpt-4o',
      'gen_ai.request.max_tokens': 1000,
      'gen_ai.request.temperature': 0.7,
    }
  }, async (span) => {
    try {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [{ role: 'user', content: prompt }],
        max_tokens: 1000,
        temperature: 0.7,
      })

      const usage = response.usage!
      span.setAttributes({
        'gen_ai.response.model': response.model,
        'gen_ai.usage.input_tokens': usage.prompt_tokens,
        'gen_ai.usage.output_tokens': usage.completion_tokens,
        'gen_ai.usage.total_tokens': usage.total_tokens,
        'gen_ai.response.finish_reason':
          response.choices[0].finish_reason,
      })
      span.setStatus({ code: SpanStatusCode.OK })
      return response
    } catch (error) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: (error as Error).message
      })
      throw error
    } finally {
      span.end()
    }
  })
}

This instrumentation creates a span for every LLM call with standardized attributes for the model, token counts, and finish reason. The span is automatically correlated with the parent trace, so you can see the LLM call in the context of the broader request that triggered it.

Auto-instrumentation with OpenLLMetry:

For teams that want instrumentation without modifying application code, the OpenLLMetry project provides drop-in auto-instrumentation for popular LLM libraries:

import { OpenLLMetry } from 'openllmetry'

// Initialize once at application startup
OpenLLMetry.init({
  exporter: new OTLPTraceExporter({
    url: 'https://otel.costhawk.com/v1/traces'
  }),
  appName: 'my-ai-app',
})

OpenLLMetry automatically patches the OpenAI, Anthropic, Google, Cohere, and other LLM client libraries to emit OTel spans with the correct gen_ai.* attributes. No code changes are needed beyond the initialization call.

Key instrumentation best practices for LLM applications:

  • Always capture gen_ai.usage.input_tokens and gen_ai.usage.output_tokens — these are the foundation for cost calculation
  • Add custom attributes for business context: app.feature, app.user_id, app.project — these enable cost attribution
  • Record the actual model returned by the API (not just the requested model), since providers may route to different model versions
  • Use span events to record prompt and response content when needed for debugging, but be mindful of payload size and PII considerations

OTel Semantic Conventions for GenAI

Semantic conventions are standardized attribute names and values that ensure telemetry data is consistent and interoperable across different instrumentation libraries and observability backends. The OpenTelemetry project maintains an evolving set of semantic conventions for Generative AI under the gen_ai.* namespace.

As of early 2026, the key semantic conventions for LLM observability are:

AttributeTypeDescriptionExample
gen_ai.systemstringThe AI provider systemopenai, anthropic, google
gen_ai.request.modelstringModel name requestedgpt-4o, claude-3.5-sonnet
gen_ai.response.modelstringModel name actually used (may differ from request)gpt-4o-2024-08-06
gen_ai.request.max_tokensintMaximum output tokens requested1000
gen_ai.request.temperaturefloatSampling temperature0.7
gen_ai.request.top_pfloatNucleus sampling parameter0.95
gen_ai.usage.input_tokensintNumber of input/prompt tokens1523
gen_ai.usage.output_tokensintNumber of output/completion tokens487
gen_ai.usage.total_tokensintTotal tokens (input + output)2010
gen_ai.response.finish_reasonstringWhy generation stoppedstop, max_tokens, tool_calls
gen_ai.promptstringThe prompt content (opt-in, may contain PII)(full prompt text)
gen_ai.completionstringThe response content (opt-in, may contain PII)(full response text)

These conventions are complemented by cost-specific attributes that CostHawk and other cost monitoring tools recognize:

AttributeTypeDescriptionExample
gen_ai.cost.input_costfloatCost of input tokens in USD0.003807
gen_ai.cost.output_costfloatCost of output tokens in USD0.004870
gen_ai.cost.total_costfloatTotal cost in USD0.008677
gen_ai.cost.currencystringCost currency codeUSD

The standardization these conventions provide is transformative for the LLM observability ecosystem. When every instrumentation library records gen_ai.usage.input_tokens with the same semantics, any observability backend can build dashboards, alerts, and analytics on top of that data without custom parsing or normalization. If you instrument your application with these conventions today, you can switch or add observability backends tomorrow without reinstrumenting.

The conventions are still evolving — the GenAI working group within the OTel project meets regularly to refine and expand the attribute set. Notable proposals in progress include attributes for tool/function calling (gen_ai.tool.name, gen_ai.tool.parameters), multi-turn conversation tracking (gen_ai.conversation.id), and quality evaluation scores (gen_ai.eval.score). CostHawk tracks these conventions as they evolve and updates its ingestion pipeline to support new attributes as they are standardized.

OTel vs Proprietary SDKs

Teams building LLM applications face a choice between instrumenting with OpenTelemetry (open standard) or with proprietary SDKs from observability vendors. Here is a detailed comparison:

DimensionOpenTelemetryProprietary SDKs (e.g., LangSmith, Helicone, Braintrust)
Vendor lock-inNone. Export to any OTLP-compatible backend. Switch backends by changing config, not code.High. Instrumentation is tightly coupled to the vendor's platform. Switching requires reinstrumentation.
Multi-backend supportNative. Configure multiple exporters to send data to Datadog + CostHawk + Jaeger simultaneously.Limited. Most proprietary SDKs only send to their own backend. Forwarding to other tools requires custom work.
Community and ecosystemMassive. 1,000+ contributors, supported by every major cloud and observability vendor. CNCF graduated project.Vendor-specific. Community size depends on the vendor's user base. Single-vendor roadmap.
GenAI coverageGrowing rapidly. gen_ai.* semantic conventions are stabilizing. Libraries like OpenLLMetry provide auto-instrumentation.Often more mature for LLM-specific features like prompt versioning, evaluation, and playground functionality.
Setup complexityModerate. Requires understanding OTel concepts (traces, spans, exporters, collectors). More configuration steps.Low. Typically a single SDK init call and API key. Faster time-to-first-insight.
CustomizationUnlimited. Custom attributes, custom span processors, custom exporters. Full control over the telemetry pipeline.Limited to what the vendor exposes. Custom attributes may or may not be supported.
Cost of telemetryInfrastructure cost only (Collector hosting, backend ingestion fees). No per-seat licensing for the instrumentation layer.Vendor pricing applies. Per-seat, per-event, or per-trace pricing that can become significant at scale.
Correlation with non-LLM telemetryNative. OTel traces span LLM calls, database queries, HTTP requests, and message queues in a single distributed trace.LLM-focused only. Correlating with broader application telemetry requires manual integration work.

The recommendation for most teams in 2026 is to adopt a hybrid approach: use OpenTelemetry as the core instrumentation layer for all telemetry, then layer on specialized tools for LLM-specific capabilities that OTel does not yet cover (prompt management, evaluation suites, fine-tuning workflows). This gives you the vendor neutrality and correlation benefits of OTel while still accessing specialized LLM tooling.

For cost monitoring specifically, OpenTelemetry is the clear winner. Cost data needs to be correlated with application context (which feature? which user? which team?) and aggregated across multiple providers. OTel's distributed tracing and flexible attribute system make this natural, while proprietary SDKs typically only see their own platform's data.

Implementing OTel for Cost Tracking

Using OpenTelemetry to track LLM costs requires capturing token usage per request, enriching it with pricing data, and exporting the results to a cost monitoring backend. Here is a complete implementation pattern:

Step 1: Configure the OTel SDK with cost-aware span processing.

import { NodeSDK } from '@opentelemetry/sdk-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'
import { Resource } from '@opentelemetry/resources'
import {
  ATTR_SERVICE_NAME,
  ATTR_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions'

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'my-ai-service',
    [ATTR_SERVICE_VERSION]: '2.1.0',
    'deployment.environment': 'production',
    'team.name': 'ml-platform',
  }),
  traceExporter: new OTLPTraceExporter({
    url: 'https://otel.costhawk.com/v1/traces',
    headers: {
      'x-costhawk-api-key': process.env.COSTHAWK_API_KEY!,
    },
  }),
})

sdk.start()

Step 2: Create a cost-enriching span processor.

import { SpanProcessor, ReadableSpan } from '@opentelemetry/sdk-trace-base'

const MODEL_PRICING: Record<string, { input: number; output: number }> = {
  'gpt-4o':              { input: 2.50,  output: 10.00 },
  'gpt-4o-mini':         { input: 0.15,  output: 0.60  },
  'claude-3.5-sonnet':   { input: 3.00,  output: 15.00 },
  'claude-3.5-haiku':    { input: 0.80,  output: 4.00  },
  'gemini-2.0-flash':    { input: 0.10,  output: 0.40  },
}

class CostEnrichmentProcessor implements SpanProcessor {
  onEnd(span: ReadableSpan): void {
    const model = span.attributes['gen_ai.response.model'] as string
    const inputTokens = span.attributes['gen_ai.usage.input_tokens'] as number
    const outputTokens = span.attributes['gen_ai.usage.output_tokens'] as number

    if (model && inputTokens && outputTokens) {
      const pricing = MODEL_PRICING[model]
      if (pricing) {
        const inputCost = (inputTokens / 1_000_000) * pricing.input
        const outputCost = (outputTokens / 1_000_000) * pricing.output
        // Note: attributes set in onEnd may require a custom exporter
        // to include in the exported span. Some implementations use
        // onStart + deferred enrichment instead.
      }
    }
  }
  onStart(): void {}
  forceFlush(): Promise<void> { return Promise.resolve() }
  shutdown(): Promise<void> { return Promise.resolve() }
}

Step 3: Add business context attributes to enable cost attribution.

The real power of OTel for cost tracking emerges when you add application-level attributes that enable grouping and attribution:

span.setAttributes({
  'app.feature': 'document-summarization',
  'app.customer_id': 'cust_abc123',
  'app.project': 'enterprise-chatbot',
  'app.environment': 'production',
  'app.api_key_id': 'key_xyz789',
})

These attributes let CostHawk slice cost data by feature, customer, project, and environment — answering questions like "Which feature costs the most?" and "Which customer is driving the biggest spend increase?" that provider dashboards cannot answer.

Step 4: Deploy an OTel Collector for pipeline flexibility.

For production deployments, route telemetry through an OTel Collector rather than exporting directly from your application. The Collector can sample, filter, batch, and fan out telemetry to multiple backends. A typical Collector configuration sends traces to both Jaeger (for debugging) and CostHawk (for cost monitoring) while filtering out low-value spans to control ingestion volume and cost.

OTel and CostHawk Integration

CostHawk provides native OpenTelemetry ingestion, accepting OTLP data over both gRPC and HTTP. This means any application instrumented with OpenTelemetry can send cost telemetry to CostHawk without a custom integration — just configure the OTLP exporter to point to CostHawk's endpoint.

Integration architecture:

┌──────────────────┐      OTLP/HTTP       ┌──────────────────┐
│  Your Application │  ─────────────────▶  │  CostHawk OTLP   │
│  (OTel SDK)       │                      │  Endpoint         │
└──────────────────┘                      └────────┬─────────┘
                                                    │
           ┌────────────────────────────────────────┤
           │                                        │
           ▼                                        ▼
┌──────────────────┐                    ┌──────────────────┐
│  Cost Attribution │                    │  Anomaly          │
│  Engine           │                    │  Detection        │
│  (per-feature,    │                    │  (baseline +      │
│   per-customer,   │                    │   deviation)      │
│   per-model)      │                    │                   │
└────────┬─────────┘                    └────────┬─────────┘
         │                                        │
         ▼                                        ▼
┌──────────────────┐                    ┌──────────────────┐
│  Dashboard &      │                    │  Webhook Alerts   │
│  Reports          │                    │  (Slack, PD, etc) │
└──────────────────┘                    └──────────────────┘

What CostHawk extracts from OTel data:

  • Token usage: gen_ai.usage.input_tokens and gen_ai.usage.output_tokens are used to calculate per-request cost using CostHawk's pricing database (which tracks the latest rates for 200+ models across all major providers).
  • Model attribution: gen_ai.response.model identifies which model was used, enabling cost-per-model breakdowns and model routing optimization recommendations.
  • Business context: Custom attributes like app.feature, app.project, app.customer_id, and app.environment enable multi-dimensional cost attribution that provider dashboards cannot provide.
  • Latency: Span duration reveals per-request latency, enabling cost-vs-latency analysis (are you paying more for faster responses? is a cheaper model acceptable given the latency requirements?).
  • Error rates: Span status codes identify failed requests that consumed tokens but did not deliver value — wasted spend that should be minimized.

Dual-path monitoring: CostHawk supports both OTel-based monitoring and wrapped-key-based monitoring simultaneously. Teams can use wrapped keys for immediate, zero-instrumentation cost tracking and add OTel instrumentation incrementally for deeper attribution and correlation. The two data sources are merged in the CostHawk dashboard, providing a unified view regardless of how the data was collected.

Collector-based deployment: For teams already running an OTel Collector, add CostHawk as an additional exporter in the Collector configuration. This requires zero changes to application code — the Collector fans out existing telemetry to CostHawk alongside your existing observability backends. This is the lowest-friction integration path for organizations that already have OTel infrastructure in place.

FAQ

Frequently Asked Questions

What is the difference between OpenTelemetry and OpenTracing?+
OpenTracing was a CNCF project that defined a vendor-neutral API for distributed tracing. OpenCensus, a Google-originated project, provided both an API and a default implementation for traces and metrics. In 2019, the two projects merged to form OpenTelemetry, combining the best aspects of both: OpenTracing's clean API design and OpenCensus's implementation completeness. OpenTelemetry supersedes both projects — OpenTracing and OpenCensus are now in maintenance mode with no new feature development. If you are starting a new project, always use OpenTelemetry. If you have existing OpenTracing instrumentation, OTel provides compatibility shims (the OpenTracing Bridge) that let you migrate incrementally without rewriting all instrumentation at once. The key improvement OTel brings over its predecessors is the addition of logs as a first-class signal (OpenTracing only handled traces), metrics with better ergonomics than OpenCensus, the OTLP wire protocol for standardized data export, and the Collector component for flexible telemetry pipelines. OpenTelemetry is now the universally recommended standard for application observability instrumentation.
Do I need to run an OpenTelemetry Collector?+
No, the Collector is optional. Your application can export telemetry directly to backends using the OTLP exporter — for example, sending traces straight from your Node.js application to CostHawk's OTLP endpoint. However, the Collector provides significant operational benefits that make it worthwhile for production deployments. First, it decouples your application from your backends: if you need to add a new backend, change sampling rates, or filter sensitive attributes, you update the Collector configuration without redeploying your application. Second, it provides buffering and retry: if a backend is temporarily unavailable, the Collector queues telemetry and retries delivery, preventing data loss. Third, it enables fan-out: send the same telemetry to multiple backends simultaneously (Jaeger for debugging, CostHawk for cost tracking, Prometheus for metrics alerting). Fourth, it handles processing: filter out high-volume, low-value spans, redact PII from attributes, enrich spans with additional metadata, and sample intelligently to control telemetry volume. For teams processing more than 10,000 LLM requests per day, the Collector is strongly recommended. For smaller workloads or development environments, direct export is simpler and perfectly adequate.
How do I instrument LLM calls without modifying application code?+
OpenTelemetry supports auto-instrumentation, which patches library functions at runtime to emit spans automatically. For LLM applications, the OpenLLMetry project (maintained by Traceloop) provides auto-instrumentation for all major LLM client libraries: OpenAI, Anthropic, Google Generative AI, Cohere, Mistral, LangChain, LlamaIndex, and more. Setup requires a single initialization call at application startup — OpenLLMetry.init({ exporter }) — and from that point, every LLM API call made through the supported client libraries automatically generates OTel spans with the correct gen_ai.* semantic convention attributes, including model name, token counts, temperature, and finish reason. The auto-instrumentation works by monkey-patching the client library's request methods, wrapping each call in a span. For Python applications, the equivalent is traceloop-sdk which provides identical auto-instrumentation. The tradeoff is that auto-instrumentation captures what the library knows about (model, tokens, parameters) but cannot capture application-level context (feature name, customer ID, project) without additional manual attribute setting. Most teams use auto-instrumentation as a starting point and add custom attributes incrementally for cost attribution.
What are the gen_ai semantic conventions and are they stable?+
The gen_ai.* semantic conventions are a set of standardized attribute names defined by the OpenTelemetry GenAI Working Group for recording information about generative AI operations. They cover the AI system (gen_ai.system), request parameters (gen_ai.request.model, gen_ai.request.max_tokens, gen_ai.request.temperature), response metadata (gen_ai.response.model, gen_ai.response.finish_reason), and usage metrics (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens). As of March 2026, the core attributes listed above are considered experimental but widely adopted — they are used by OpenLLMetry, Traceloop, LangSmith's OTel export, and CostHawk's ingestion pipeline. The OTel project uses a maturity model (experimental → stable) and the GenAI conventions are progressing toward stable status. In practice, the core token and model attributes are unlikely to change in breaking ways because of their widespread adoption. More experimental attributes covering tool calling, conversation tracking, and evaluation scores are still evolving. CostHawk tracks convention changes and maintains backward compatibility, so even if attribute names are adjusted in future versions, existing instrumentation continues to work.
How does OpenTelemetry handle sensitive prompt and response content?+
By default, OpenTelemetry instrumentation for LLM applications does not capture prompt or response content. The gen_ai.prompt and gen_ai.completion attributes are opt-in, meaning you must explicitly configure your instrumentation to record them. This default-off behavior exists because prompts and responses frequently contain PII (personally identifiable information), proprietary data, or sensitive business logic that should not be transmitted to observability backends without explicit consent. When you do need content capture for debugging or quality evaluation, implement it with safeguards: (1) Use a span processor that redacts known PII patterns (email addresses, phone numbers, SSNs) before export. (2) Configure content capture only in development or staging environments, not production. (3) If using the OTel Collector, add a processing pipeline that strips content attributes before forwarding to external backends while preserving them for internal backends. (4) Set attribute size limits to prevent extremely long prompts or responses from inflating telemetry payload sizes. (5) Use OTel's sampling capabilities to capture content for only a sample of requests (e.g., 1%) rather than all traffic. CostHawk does not require prompt content for cost tracking — token counts and model metadata are sufficient. Content capture is only needed for debugging and quality evaluation workflows.
What is the performance overhead of OpenTelemetry instrumentation?+
OpenTelemetry is designed for production use with minimal performance impact. The overhead depends on your sampling rate, export method, and telemetry volume, but typical numbers for LLM applications are negligible compared to the LLM API call latency itself. Creating and populating a span takes approximately 1–5 microseconds of CPU time — invisible compared to an LLM API call that takes 500ms–5 seconds. The memory overhead per active span is approximately 2–4 KB, and spans are batched and exported asynchronously so they do not block your application's critical path. The OTel SDK's batch span processor accumulates spans in memory and exports them in configurable batches (default: every 5 seconds or when 512 spans accumulate), using a background thread or async task. For high-volume applications, the primary cost driver is not the instrumentation overhead but the telemetry export volume — sending large payloads to backends consumes network bandwidth and may incur ingestion fees. Control this with sampling: a 10% sampling rate captures enough data for statistical analysis while reducing export volume by 90%. For LLM cost tracking, CostHawk recommends 100% sampling (every request) because cost data needs to be complete for accurate budget tracking. The performance overhead of 100% sampling with OTel is still under 1% of total request latency for typical LLM workloads.
Can I use OpenTelemetry with LangChain and LlamaIndex?+
Yes, both LangChain and LlamaIndex have OpenTelemetry integration support. LangChain provides a LangChainInstrumentor through the OpenLLMetry/Traceloop ecosystem that automatically instruments chain executions, LLM calls, tool invocations, and retrieval operations. Each step in a LangChain chain or agent becomes a span in the OTel trace, with parent-child relationships reflecting the chain's execution flow. This gives you visibility into not just the LLM call but the entire chain — how long retrieval took, which tools were invoked, and how many LLM calls a single agent run required. LlamaIndex similarly provides OTel instrumentation that captures query engine operations, retrieval steps, and LLM calls as connected spans. The LlamaIndexInstrumentor from OpenLLMetry patches LlamaIndex's internal APIs to emit spans automatically. For cost tracking, this framework-level instrumentation is particularly valuable because it reveals the true cost of complex operations. A single LangChain agent run might make 5–10 LLM calls internally; without OTel instrumentation, you would only see aggregate token counts. With it, you see each call individually, can identify which chain steps are most expensive, and can optimize the specific steps that drive the most cost.
How does CostHawk use OpenTelemetry data for cost attribution?+
CostHawk's OTLP ingestion pipeline processes incoming OTel spans and extracts cost-relevant attributes to build a multi-dimensional cost model. The pipeline works in several stages. First, it reads gen_ai.response.model and matches it against CostHawk's pricing database (which tracks current rates for 200+ models across OpenAI, Anthropic, Google, Mistral, Cohere, and other providers). Second, it multiplies gen_ai.usage.input_tokens and gen_ai.usage.output_tokens by the corresponding per-million-token rates to compute per-request cost. Third, it reads custom attributes (app.feature, app.project, app.customer_id, app.environment, app.api_key_id) and indexes the cost data along these dimensions, enabling multi-dimensional queries like 'total cost for the document-summarization feature in production this week.' Fourth, it feeds the time-series cost data into CostHawk's anomaly detection engine, which establishes baselines per dimension and alerts on deviations. The result is that OTel-instrumented applications get the same rich cost dashboards, alerts, and attribution that wrapped-key users get, plus the additional correlation and context that OTel traces provide. Teams can drill from a cost anomaly alert into the specific OTel traces that drove the spike, seeing exactly which requests, features, and users were responsible.

Related Terms

Tracing

The practice of recording the full execution path of an LLM request — from prompt construction through model inference to response delivery — with timing and cost attribution at each step. Tracing provides the granular visibility needed to understand where time and money are spent in multi-step AI pipelines.

Read more

Spans

Individual units of work within a distributed trace. Each span records a single operation — such as an LLM call, a retrieval step, or a tool invocation — with its duration, token counts, cost, metadata, and parent-child relationships that reveal the full execution graph of an AI request.

Read more

LLM Observability

The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that matters: token consumption, cost, latency, error rates, and output quality. LLM observability goes far beyond traditional APM by tracking AI-specific metrics that determine both the reliability and the economics of your AI features.

Read more

Logging

Recording LLM request and response metadata — tokens consumed, model used, latency, cost, and status — for debugging, cost analysis, and compliance. Effective LLM logging captures the operational envelope of every API call without storing sensitive prompt content.

Read more

Latency

The total elapsed time between sending a request to an LLM API and receiving the complete response. LLM latency decomposes into time-to-first-token (TTFT) — the wait before streaming begins — and generation time — the duration of token-by-token output. Latency directly trades off against cost: faster models and provisioned throughput reduce latency but increase spend.

Read more

Cost Per Query

The total cost of a single end-user request to your AI-powered application, including all token consumption, tool calls, and retries.

Read more

AI Cost Glossary

Put this knowledge to work. Track your AI spend in one place.

CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.