Wrapped Keys
Proxy API keys that route provider SDK traffic through a cost tracking layer. The original provider key never leaves the server, while the wrapped key provides per-key attribution, budget enforcement, and policy controls without requiring application code changes beyond a base URL swap.
Definition
What is Wrapped Keys?
Impact
Why It Matters for AI Costs
Wrapped keys solve three fundamental problems that plague organizations managing AI API costs at scale: cost attribution, security isolation, and policy enforcement.
Problem 1: Cost attribution without key sprawl. To track costs per team or per service using provider-native keys, you must create a separate provider key (and often a separate organization or project) for each attribution boundary. For an organization with 8 teams and 15 services, that is potentially 120+ provider keys (8 teams × 15 services), each requiring separate billing configuration. Wrapped keys decouple attribution from provider-level key management: you create a handful of provider keys (one per provider, or one per billing entity) and generate unlimited wrapped keys on top of them. Each wrapped key gets its own cost tracking, budget, and usage dashboard. CostHawk customers typically reduce their provider key count by 80% while increasing their attribution granularity by 5–10x.
Problem 2: Key security. In a traditional setup, real provider API keys must be distributed to every application that calls the AI API. Each distribution point is a potential leak vector: environment variables, CI/CD secrets, developer machines, configuration files. If any one of these points is compromised, the attacker gains direct access to your provider account. With wrapped keys, the real provider key never leaves the proxy server. Applications only ever see the wrapped key, which is worthless without the proxy — it cannot be used to call the provider directly. If a wrapped key is compromised, you revoke it in the proxy dashboard and issue a replacement. The provider key remains unchanged, and all other wrapped keys continue to work. The blast radius of a compromise is reduced from "entire provider account" to "one team's cost attribution."
Problem 3: Centralized policy enforcement. Without a proxy layer, policy enforcement (budget caps, model restrictions, rate limits) must be implemented in each application independently. This means duplicated logic, inconsistent enforcement, and no central control plane. Wrapped keys centralize policy enforcement at the proxy layer: every request, regardless of which application sends it, passes through the same policy engine. A change to a budget cap takes effect immediately across all applications using that wrapped key — no code deployments, no configuration changes, no cross-team coordination required.
The financial impact is measurable. CostHawk customers deploying wrapped keys for the first time typically discover 15–25% of unattributed or misattributed spend, identify 10–20% of spend going to development and testing environments, and achieve full per-team and per-service cost visibility within their first week. The operational overhead is minimal: wrapped keys require only a base URL change and a key swap in each consuming application.
What Are Wrapped Keys?
Wrapped keys are proxy-issued API credentials that serve as intermediaries between your application and an LLM provider's API. They are called "wrapped" because the real provider key is wrapped inside the proxy's security layer — your application never sees, stores, or transmits the real key. Instead, it uses a wrapped key that the proxy translates into the real key at request time.
A wrapped key looks similar to a standard API key but with a distinct prefix that identifies it as a proxy key:
// Real OpenAI key (stays on the proxy server, encrypted)
sk-proj-abc123def456...
// CostHawk wrapped keys (distributed to applications)
ch_engineering_prod_sk_7f8a9b... → Engineering team, production
ch_engineering_dev_sk_2c3d4e... → Engineering team, development
ch_product_prod_sk_9e0f1a... → Product team, production
ch_data_science_sk_4b5c6d... → Data science teamAll four wrapped keys map to the same underlying OpenAI key, but each has its own identity in CostHawk's system. When ch_engineering_prod_sk_7f8a9b is used to make a request, CostHawk logs the request against the engineering team's production cost center. When ch_data_science_sk_4b5c6d is used, it is logged against the data science team. The provider sees the same API key for both requests; CostHawk sees two distinct attribution streams.
Wrapped keys are not encrypted or derived versions of the provider key — they are entirely separate credentials generated by the proxy. There is no mathematical relationship between a wrapped key and the provider key it maps to. This is an important security property: possessing a wrapped key gives you no information about the real provider key, and brute-forcing the provider key from a wrapped key is computationally infeasible.
The concept is analogous to virtual credit card numbers in consumer finance. A virtual card number routes transactions through the card issuer to your real card, adding a layer of control (spending limits, merchant restrictions) and security (the real card number is never exposed to merchants). Wrapped keys provide the same benefits for API credentials: control, attribution, and security without changing the underlying provider relationship.
How Wrapped Keys Work
The lifecycle of a request using a wrapped key involves five steps, all of which happen transparently from the application's perspective.
Step 1: Application sends request with wrapped key. Your application uses the provider's standard SDK, with two configuration changes: the API key is set to the wrapped key, and the base URL is set to the proxy endpoint.
import Anthropic from "@anthropic-ai/sdk"
const client = new Anthropic({
apiKey: "ch_engineering_prod_sk_7f8a9b...",
baseURL: "https://proxy.costhawk.com" // CostHawk proxy
})
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain load balancing for LLM APIs." }
]
})The SDK does not know it is talking to a proxy. It constructs a standard HTTP request and sends it to the proxy URL.
Step 2: Proxy authenticates and looks up the wrapped key. The proxy receives the request, extracts the wrapped key from the Authorization header, and looks it up in its database. The lookup returns: the real provider API key (encrypted, decrypted only in memory for the forwarding step), the owning team and project, the key's policies (budget cap, model allowlist, rate limit), and the current period's usage (spend, request count).
Step 3: Proxy applies policies. Before forwarding, the proxy evaluates the request against the key's policies:
- Budget check: Estimate the request cost and verify it will not exceed the key's remaining budget. A GPT-4o request with 2,000 input tokens costs approximately $0.005 — is the key's daily budget still above $0.005?
- Model check: Is the requested model on the key's allowlist? If the key is restricted to GPT-4o mini and the request targets GPT-4o, return a
403 Forbiddenerror. - Rate limit check: Has the key exceeded its configured RPM or TPM limit?
If any policy is violated, the proxy returns an appropriate error response without forwarding the request to the provider. No provider tokens are consumed and no cost is incurred.
Step 4: Proxy forwards to provider. The proxy constructs a new HTTP request to the provider's real API endpoint, substituting the wrapped key with the decrypted real provider key. The request body and headers are forwarded as-is (with the key substitution). For streaming requests, the proxy establishes an SSE connection to the provider.
Step 5: Proxy relays response and logs usage. The provider's response flows back through the proxy to the application. The proxy extracts usage metadata from the response (input tokens, output tokens, model, latency) and asynchronously writes a usage record attributed to the wrapped key. For streaming responses, the proxy forwards each chunk in real time while accumulating the token count. The usage record is available in the CostHawk dashboard within seconds.
Total overhead: 5–15ms of additional latency per request, entirely in steps 2 and 3. Steps 4 and 5 add no latency because the provider response is streamed directly to the application while logging happens asynchronously.
Wrapped Keys vs Direct API Keys
Understanding the tradeoffs between wrapped keys and direct API keys helps you decide which approach is right for each use case.
| Characteristic | Direct API Key | Wrapped Key (CostHawk) |
|---|---|---|
| Setup complexity | None — use provider key directly | Minimal — swap API key + base URL |
| Cost attribution | Per-organization or per-project at provider level | Per-key, per-team, per-service, per-environment |
| Budget enforcement | Monthly limits at provider level only | Per-key daily/monthly caps with hard enforcement |
| Model restrictions | None — any key can access any model | Per-key model allowlists |
| Rate limiting | Provider-imposed per-account limits | Custom per-key limits on top of provider limits |
| Key security | Real key distributed to all applications | Real key stays on proxy; apps see only wrapped key |
| Blast radius of compromise | Full provider account access | Only the compromised wrapped key's scope |
| Key rotation impact | Must update every consuming application | Update once in proxy; no app changes needed |
| Cross-provider analytics | Check each provider dashboard separately | Unified dashboard across all providers |
| Real-time cost tracking | Delayed (hours to days) | Per-request, within seconds |
| Latency overhead | None | 5–15ms per request |
| Dependency | Provider only | Provider + proxy service |
The tradeoff is clear: wrapped keys add a small latency overhead (5–15ms) and a dependency on the proxy service, but in return provide dramatically better cost visibility, security, and operational control. For development and experimentation, direct keys are often sufficient. For production workloads with real cost accountability requirements, wrapped keys are the standard approach.
The dependency on the proxy service is the most common objection to wrapped keys. If the proxy goes down, your LLM-powered features go down. This risk is mitigated by proxy redundancy (CostHawk runs across multiple availability zones) and client-side circuit breakers that fall back to direct provider access if the proxy is unresponsive. The fallback sacrifices cost tracking during the failover period but maintains application availability.
One often-overlooked benefit of wrapped keys is vendor flexibility. Because your applications reference wrapped keys (not provider keys), you can change the underlying provider without updating any application code. If you decide to switch from OpenAI to Anthropic for a particular workload, update the provider key mapping in CostHawk's dashboard. All applications using the relevant wrapped keys immediately route to the new provider. This decoupling of application identity from provider identity provides significant operational agility.
Cost Tracking with Wrapped Keys
Wrapped keys transform AI cost tracking from a monthly billing reconciliation exercise into a real-time, granular, actionable data stream. Every request through a wrapped key generates a cost record that can be sliced along multiple dimensions.
Per-key cost tracking. The most granular view: every wrapped key has its own running cost total. If you issue separate wrapped keys for each service, you can see that ch_chatbot_prod_sk_... has spent $3,247.18 this month while ch_code_review_prod_sk_... has spent $1,892.44. This is the foundation for all higher-level cost analytics.
Per-project aggregation. Group wrapped keys by project or product line to see total cost at the business level. The "Customer Support AI" project includes the chatbot service, the ticket classifier, and the sentiment analyzer — their combined wrapped key costs show the true cost of the project: $7,841/month, not the $4,200 that the chatbot team thought they were spending.
Per-team chargebacks. Tag wrapped keys with team identifiers and generate monthly chargeback reports. Engineering: $12,400. Product: $8,200. Data Science: $15,600. Research: $3,800. Total: $40,000 — which matches the provider invoice, confirming that all costs are attributed. Without wrapped keys, this reconciliation is guesswork.
Per-model cost breakdown. Within each wrapped key, see which models are consuming the budget. The data science team's wrapped key shows: 60% of spend on Claude 3.5 Sonnet (evaluation and generation tasks), 25% on GPT-4o (comparison benchmarks), and 15% on GPT-4o mini (preprocessing). This breakdown reveals optimization opportunities: could the preprocessing step use an even cheaper model?
Environment cost separation. With separate wrapped keys for dev, staging, and production, you can see that development accounts for 28% of total spend. Of that 28%: 40% comes from a single developer running large-scale experiments with Claude 3 Opus, 30% from integration test suites that call the real API instead of mocking, and 30% from legitimate development usage. The first two categories are immediate optimization targets — switch the experiments to a cheaper model and mock API calls in integration tests.
Time-series analysis. CostHawk stores timestamped cost records for every request, enabling time-series analysis at any granularity: hourly, daily, weekly, or monthly. This reveals patterns like: weekend traffic is 30% lower (can you scale down batch jobs?), a new feature launch on March 3 increased the chatbot's daily cost by $200 (expected?), and monthly costs have grown 12% month-over-month for the past three months (is this aligned with user growth?).
Anomaly detection. Per-key cost baselines enable statistical anomaly detection. CostHawk compares each key's current-period spend against its rolling 7-day and 30-day averages. A key that suddenly spends 3x its normal daily rate triggers an alert. This catches: runaway scripts (infinite loops calling the API), prompt injection attacks (malicious inputs that trigger expensive processing), configuration errors (wrong model selected, causing 10x cost increase), and organic traffic spikes that may need investigation.
Setting Up Wrapped Keys
Setting up CostHawk wrapped keys takes under 10 minutes per provider and requires only two code changes per consuming application. Here is the complete setup guide.
Step 1: Create a CostHawk account. Sign up at costhawk.com and complete the organization setup. CostHawk uses Clerk for authentication — you can sign up with email, Google, or GitHub.
Step 2: Add your provider API keys. Navigate to Dashboard → Integrations and click Add Provider Key. Select the provider (OpenAI, Anthropic, Google), paste your API key, and give it a label (e.g., "OpenAI Production" or "Anthropic Team Account"). CostHawk encrypts the key with AES-256-CBC using a unique initialization vector and stores it in the secure database. The raw key is never displayed again after initial entry — you will interact with it only through wrapped keys.
Step 3: Generate wrapped keys. For each provider key, generate one or more wrapped keys. Each wrapped key needs:
- Label: A human-readable identifier (e.g., "Engineering - Chatbot - Production")
- Tags: Metadata for grouping in analytics (team: engineering, service: chatbot, env: production)
- Budget cap (optional): Maximum daily or monthly spend ($100/day for dev keys, $5,000/day for production)
- Model allowlist (optional): Restrict which models this key can access (e.g., gpt-4o-mini only for dev keys)
- Rate limit (optional): Maximum RPM for this specific key
CostHawk generates a wrapped key with a ch_ prefix: ch_eng_chatbot_prod_sk_7f8a9b...
Step 4: Update your application code. In each consuming application, make two changes:
// OpenAI SDK example
import OpenAI from "openai"
// Before: direct provider key
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY // sk-proj-abc...
})
// After: wrapped key through CostHawk proxy
const client = new OpenAI({
apiKey: process.env.COSTHAWK_WRAPPED_KEY, // ch_eng_chatbot_prod_sk_...
baseURL: "https://proxy.costhawk.com/v1"
})
// Everything else stays the same — same SDK, same methods, same parameters
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }]
})// Anthropic SDK example
import Anthropic from "@anthropic-ai/sdk"
const client = new Anthropic({
apiKey: process.env.COSTHAWK_WRAPPED_KEY,
baseURL: "https://proxy.costhawk.com"
})
// Same API surface — messages.create, streaming, tool use — all work identically
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello!" }]
})Step 5: Deploy and verify. Deploy the code changes. Within seconds, requests begin appearing in the CostHawk dashboard, attributed to your wrapped key. Verify that: requests are being logged with correct token counts, the cost computation matches your expectations, the latency overhead is within the expected 5–15ms range, and any configured budget caps or model restrictions are being enforced correctly.
Step 6: Roll out to all services. Repeat Step 4 for each service that calls an AI API. Create separate wrapped keys for each service, team, and environment to maximize attribution granularity. A typical rollout takes 1–2 days for an organization with 10–15 services.
Wrapped Key Security Model
The security model of wrapped keys is designed around the principle that the real provider API key should have the minimum possible exposure. Here is how each layer of the security model works.
Provider key isolation. The real provider API key exists in exactly one place: CostHawk's encrypted database. It is encrypted with AES-256-CBC using a unique initialization vector (IV) per key. The encryption key is stored separately from the encrypted data (in an environment variable on the proxy server, not in the database). The raw provider key is decrypted only in memory, only for the duration of a single request forwarding operation, and is never written to logs, API responses, error messages, or monitoring systems. Even CostHawk's own support team cannot access the decrypted provider key — the architecture enforces this through separation of the encryption key from the encrypted data.
Wrapped key authentication. When a request arrives with a wrapped key, the proxy validates it against the database. The validation checks: (1) the wrapped key exists and is not revoked, (2) the wrapped key is not expired, (3) the wrapped key's associated provider key is still active. If any check fails, the request is rejected with a 401 Unauthorized response. The wrapped key itself is stored as a salted hash in the database — even a database breach would not reveal usable wrapped keys.
Blast radius containment. If a wrapped key is compromised, the attacker can make requests through the proxy using that key's identity and budget. However, they cannot: access the real provider key, use models outside the key's allowlist, exceed the key's budget cap, or affect other wrapped keys. The damage is contained to the compromised key's scope. Revoking the wrapped key immediately stops all access — the attacker must go through the proxy, and the proxy rejects revoked keys. Compare this to a compromised direct provider key, where the attacker has full access to every model at any volume until the key is rotated at the provider level.
Network security. All traffic between applications and the CostHawk proxy uses TLS 1.3. All traffic between the proxy and upstream providers uses TLS 1.2 or higher (dictated by the provider). Certificate pinning is available for enterprise deployments that require it. The proxy does not accept plaintext HTTP connections.
Audit trail. Every administrative action on wrapped keys is logged in an immutable audit trail: key creation (who, when, with what policies), policy changes (who changed the budget cap from $100 to $500, and when), key revocation (who, when, reason), and usage anomalies (automated flags for unusual patterns). This audit trail supports compliance requirements and provides forensic data for security investigations. The audit log is append-only and cannot be modified or deleted by any user, including administrators.
Defense in depth. The wrapped key security model implements defense in depth — multiple independent security layers that each provide protection even if another layer fails:
| Layer | Protection | What It Prevents |
|---|---|---|
| Encryption at rest | AES-256-CBC with unique IV | Database breach exposing raw keys |
| Key isolation | Provider key never leaves proxy | Key leakage through application logs or code |
| Budget caps | Per-key spending limits | Unlimited spend from compromised keys |
| Model restrictions | Per-key model allowlists | Compromised key accessing expensive models |
| Rate limits | Per-key RPM/TPM caps | High-volume abuse from compromised keys |
| Audit logging | Immutable action log | Undetected unauthorized administrative changes |
| TLS in transit | TLS 1.3 for all connections | Man-in-the-middle interception of keys or data |
No single layer is sufficient on its own, but together they create a security posture that significantly exceeds what is possible with direct provider key management. The proxy architecture makes this layered security practical — implementing the same controls in every consuming application would require massive code duplication and constant maintenance.
FAQ
Frequently Asked Questions
Are wrapped keys compatible with all LLM provider SDKs?+
base_url or baseURL), Anthropic's Python and TypeScript SDK (set base_url or baseURL), Google's Generative AI SDK (configure endpoint), and all popular open-source libraries like LangChain, LlamaIndex, and Vercel AI SDK. The proxy accepts requests in the same format as the upstream provider and returns responses in the same format, so no code changes are needed beyond the base URL and API key configuration. For HTTP clients making raw requests (fetch, axios, curl), simply change the URL and Authorization header. Streaming, function calling, tool use, vision inputs, and all other provider features work identically through the proxy.What happens if I exceed a wrapped key's budget cap?+
429 Too Many Requests response and a descriptive error message: "Budget limit exceeded for this key. Daily limit: $100.00, current spend: $100.02." The rejection happens at the proxy layer before the request reaches the provider, so no additional provider tokens are consumed. In-flight requests that were forwarded before the cap was reached will complete normally — the cap is checked at request initiation, not at response completion, so a request that pushes spend $0.02 over the cap will not be cancelled mid-stream. Your application should handle this 429 response gracefully, either by surfacing a user-facing message or by falling back to an alternative key with remaining budget. CostHawk sends an alert when a key reaches 80% of its budget, giving you time to increase the cap or investigate unexpected spend before requests start being rejected.Can I use the same wrapped key across multiple applications?+
ch_{team}_{service}_{environment}_sk_... scales well and makes dashboards self-documenting. The one exception is when multiple instances of the same service (horizontal scaling) share a key — this is fine because all instances represent the same cost center.How do wrapped keys handle provider key rotation?+
What is the performance overhead of using wrapped keys?+
Can I migrate from direct API keys to wrapped keys without downtime?+
How do wrapped keys work with serverless functions?+
What data does CostHawk log when I use wrapped keys?+
Related Terms
API Key Management
Securing, rotating, scoping, and tracking API credentials across AI providers. Effective key management is the foundation of both cost attribution and security — every unmanaged key is a potential source of untracked spend and unauthorized access.
Read moreLLM Proxy
A transparent intermediary that sits between your application and LLM providers, forwarding requests while adding tracking, caching, or policy enforcement without code changes. Proxies intercept standard SDK traffic, log usage metadata, and optionally transform requests before relaying them upstream.
Read moreAI Cost Allocation
The practice of attributing AI API costs to specific teams, projects, features, or customers — enabling accountability, budgeting, and optimization at the organizational level.
Read moreToken Budget
Spending limits applied per project, team, or time period to prevent uncontrolled AI API costs and protect against runaway agents.
Read moreRate Limiting
Provider-enforced caps on API requests and tokens per minute that throttle throughput and return HTTP 429 errors when exceeded.
Read moreCost Anomaly Detection
Automated detection of unusual AI spending patterns — sudden spikes, gradual drift, and per-key anomalies — before they become budget-breaking surprises.
Read moreAI Cost Glossary
Put this knowledge to work. Track your AI spend in one place.
CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.
