GlossaryInfrastructureUpdated 2026-03-16By Chase Dillingham

Webhook

Q: What is the difference between a webhook and polling an API?

Polling means your application repeatedly calls an API endpoint on a schedule — every 30 seconds, every minute, every 5 minutes — to check whether anything has changed. A webhook inverts this: the source system pushes a notification to your endpoint the moment an event occurs. The practical differences are significant. Polling introduces latency equal to your polling interval — if you poll every 5 minutes, an event might go unnoticed for up to 5 minutes. Webhooks deliver within seconds. Polling generates unnecessary API calls when nothing has changed — 95% of polling requests typically return 'no new data,' wasting compute and counting against rate limits. Webhooks only fire when something actually happens. Polling requires your application to run continuously and manage state (tracking what it has already seen). Webhook receivers can be stateless serverless functions that process each event independently. The tradeoff is that webhooks require you to expose an HTTP endpoint, handle retries and idempotency, and verify signatures. For cost alerting specifically, the latency advantage of webhooks is decisive: a 5-minute polling delay during a cost spike could mean hundreds of dollars in unnecessary spend that real-time webhook delivery would have prevented.

Q: How do I secure my webhook endpoint against spoofed requests?

Webhook security requires multiple layers to prevent unauthorized actors from sending fake events to your endpoint. The primary mechanism is HMAC signature verification . CostHawk signs every webhook payload with your unique signing secret using HMAC-SHA256. When a request arrives, compute the HMAC of the raw request body using your stored signing secret and compare it to the signature in the X-CostHawk-Signature header. Use a constant-time comparison function (like Node.js crypto.timingSafeEqual ) to prevent timing attacks. Beyond signature verification, implement these additional safeguards: (1) Validate the Content-Type header is application/json . (2) Parse the payload and verify the api_version matches expected values. (3) Check the created_at timestamp is within an acceptable window (e.g., last 5 minutes) to prevent replay attacks. (4) Restrict your endpoint's firewall to CostHawk's published IP ranges if your infrastructure supports IP allowlisting. (5) Use HTTPS exclusively — never accept webhooks over plain HTTP. (6) Rotate your signing secret periodically and support dual-secret validation during rotation to avoid delivery gaps.

Q: What happens if my webhook endpoint is down when an alert fires?

CostHawk implements an automatic retry policy with exponential backoff to handle endpoint downtime. When a delivery attempt fails (timeout, connection refused, or non-2xx response), CostHawk retries up to 6 times over approximately 11 hours: the first retry at 30 seconds, then 2 minutes, 10 minutes, 30 minutes, 2 hours, and finally 8 hours. Each retry attempt is logged with the response code and error details, visible in the Webhooks dashboard under the delivery history for that endpoint. If all 6 retry attempts fail, the event is permanently marked as failed — but it is never deleted. You can view all failed deliveries in the dashboard and manually replay them once your endpoint is restored. CostHawk also sends a separate meta-alert (via email or a backup webhook endpoint, if configured) when an endpoint's failure rate exceeds 5%, so you know your alerting pipeline is degraded even if your primary notification channel is the one that is down. For critical alerting, configure at least two webhook endpoints on different infrastructure — for example, Slack plus PagerDuty — so that a single endpoint failure does not leave you blind to cost events.

Q: How do I avoid alert fatigue from too many webhook notifications?

Alert fatigue occurs when teams receive so many notifications that they start ignoring them — a dangerous outcome for cost monitoring. CostHawk provides several mechanisms to keep webhook volume manageable. First, tune your thresholds based on actual data. If your daily spend normally fluctuates between $200–$400, setting a threshold at $250 will trigger constantly. Set it at $500 (2x average) or use percentage-based thresholds that adapt to your baseline. Second, use severity-based routing : critical alerts (budget exhausted, anomaly detected) go to PagerDuty; warnings (approaching threshold) go to Slack; informational events (usage milestones) go to a weekly email digest. Third, enable cooldown periods — configure a minimum interval between repeated firings of the same alert rule. If a cost threshold is breached and stays breached, you likely want one alert, not one every 15 minutes. CostHawk supports cooldowns from 15 minutes to 24 hours per alert rule. Fourth, use alert grouping to batch related events. If five API keys simultaneously exceed their budgets because of a shared upstream issue, CostHawk groups them into a single webhook delivery with all five keys listed. Finally, review alert rules quarterly and remove or adjust rules that fire frequently without leading to action — those are noise, not signal.

Q: Can I use webhooks to automatically pause API keys when costs spike?

Yes — automated remediation via webhooks is one of the most powerful cost-protection strategies available. The pattern works as follows: CostHawk detects a cost anomaly or threshold breach and fires a webhook to a serverless function (AWS Lambda, Cloudflare Worker, or Vercel Edge Function). The function receives the payload, validates the signature, evaluates the severity, and if the conditions warrant it, calls the CostHawk API to pause the offending API key. The paused key immediately stops proxying requests, returning a 429 status code to callers. Here is a simplified example of an auto-pause Lambda function: receive the webhook, check if severity is 'critical' and the cost exceeds 3x the daily average, then call POST /api/v1/keys/{key_id}/pause . Important safeguards to implement: (1) Never auto-pause keys tagged as 'production-critical' without human approval — send a PagerDuty page instead. (2) Include a manual override mechanism so on-call engineers can unpause keys immediately. (3) Log every automated action for audit purposes. (4) Test the automation thoroughly in a staging environment before enabling it for production keys. (5) Set up a confirmation webhook that notifies your team whenever a key is auto-paused, so humans are always aware of automated actions. CostHawk's webhook payloads include enough context (key ID, project, model breakdown) to make informed automated decisions without additional API calls.

Q: What payload format do CostHawk webhooks use?

CostHawk webhooks deliver JSON payloads over HTTP POST with a Content-Type: application/json header. Every payload follows a consistent envelope structure with four top-level fields: id (unique event identifier for idempotency), type (event type string like cost.threshold.exceeded , anomaly.detected , budget.warning , or budget.exhausted ), created_at (ISO 8601 timestamp), and data (event-specific payload). The data object varies by event type but always includes alert_id , alert_name , severity (critical, warning, or info), resource (the affected project, key, or organization), and dashboard_url (a direct link to investigate in the CostHawk UI). For cost threshold events, the data includes threshold configuration, current_value , percent_over , and a breakdown object with per-model and per-key cost attribution. For anomaly events, it includes expected_value , actual_value , z_score , and anomaly_type . Payloads are typically 1–3 KB, well within the limits of any HTTP receiver. The schema is versioned via the api_version field, and CostHawk maintains backward compatibility within major versions so existing integrations continue working when new fields are added.

Q: How do I test webhooks during development?

Testing webhooks locally requires bridging the gap between CostHawk's cloud infrastructure and your development machine. The most common approach is to use a tunneling service that gives your local server a public URL. Tools like ngrok , Cloudflare Tunnel , or localtunnel create a secure tunnel from a public endpoint to your local port. Run ngrok http 3000 to get a URL like https://abc123.ngrok.io , then register that URL as a webhook endpoint in CostHawk. Incoming webhooks will be forwarded to your local development server. CostHawk also provides a built-in testing workflow: click "Send Test Event" on any webhook endpoint in the dashboard, and CostHawk delivers a realistic sample payload matching your configured event types. The test payload includes a "test": true flag so your handler can distinguish test events from real ones. For automated testing in CI/CD pipelines, use CostHawk's webhook payload schema to generate mock payloads and feed them directly to your handler function without involving the network layer at all. Finally, the CostHawk dashboard includes a real-time delivery log that shows the exact request headers, request body, response code, and response time for every delivery attempt — invaluable for debugging payload parsing issues, signature verification problems, or unexpected response codes.

Q: How many webhook endpoints can I configure in CostHawk?

CostHawk supports up to 25 webhook endpoints per organization on the Pro plan and up to 100 endpoints on the Enterprise plan. Each endpoint can subscribe to any combination of event types and apply filters for specific projects, API keys, severity levels, or models. There is no limit on the number of event types per endpoint — a single endpoint can receive all event types if desired, or you can create dedicated endpoints for specific event categories. CostHawk supports fan-out delivery, meaning a single event can be delivered to multiple endpoints simultaneously. If you have both a Slack webhook and a PagerDuty webhook subscribed to the same event type, both receive the payload independently with their own retry tracking. Delivery rate limits are generous: CostHawk can deliver up to 1,000 webhook events per minute per organization, with burst capacity for alert storms. If your alert rules generate more events than this limit, CostHawk batches them into grouped payloads to stay within the rate limit while ensuring no events are lost. For organizations with complex routing needs, CostHawk also supports webhook transformations — custom payload templates that reshape the standard payload to match your destination's expected format, eliminating the need for an intermediate relay service.

An HTTP callback that pushes real-time notifications when events occur — cost threshold breaches, anomaly detection alerts, usage milestones. Webhooks are the delivery mechanism that turns passive monitoring into active, automated response workflows across Slack, PagerDuty, Discord, and any HTTP endpoint.

Definition

What is Webhook?

A webhook is a user-defined HTTP callback — a URL that receives an automated POST request whenever a specific event occurs in a source system. Unlike traditional polling, where your application repeatedly asks "has anything changed?", webhooks invert the communication pattern: the source system pushes data to your endpoint the moment something happens. In the context of AI cost management, webhooks deliver real-time notifications when cost thresholds are breached, usage anomalies are detected, budget limits are approached, or usage milestones are reached. The webhook payload — typically a JSON object — contains structured event data including the event type, timestamp, affected resource, current values, threshold values, and contextual metadata. Your receiving endpoint can then trigger any downstream action: posting an alert to Slack, paging an on-call engineer via PagerDuty, updating an internal dashboard, pausing an API key, or executing a cost-saving automation. Webhooks are the connective tissue between monitoring systems and response workflows, enabling teams to react to cost events in seconds rather than discovering them hours or days later during manual review.

Impact

Why It Matters for AI Costs

AI API costs can spike dramatically and without warning. A single misconfigured batch job, a prompt injection that triggers infinite loops, or a sudden traffic surge can turn a $50/day workload into a $5,000/day emergency in under an hour. Without real-time notification, these cost events go undetected until someone reviews a billing dashboard — often days later.

Consider the timeline of a typical cost incident without webhooks:

Hour 0: A deployment introduces a bug that retries failed API calls indefinitely
Hours 0–8: The bug runs overnight, consuming 50x normal token volume
Hour 24: An engineer notices the daily cost report looks high
Hour 26: The team investigates, identifies the bug, and deploys a fix
Total damage: $12,000 in unnecessary API spend over 26 hours

Now consider the same incident with webhooks configured:

Hour 0: The same bug is deployed
Minute 15: CostHawk detects hourly spend exceeding the 2x threshold and fires a webhook
Minute 16: Slack alert reaches the #cost-alerts channel; PagerDuty pages the on-call engineer
Minute 30: The engineer identifies the bug and rolls back the deployment
Total damage: $150 in unnecessary API spend over 30 minutes

That is an 80x reduction in cost impact — from $12,000 to $150 — purely from having real-time webhook notifications. Webhooks transform cost monitoring from a retrospective reporting exercise into a real-time operational capability. They are not optional infrastructure for any team spending more than $1,000/month on AI APIs. CostHawk supports configurable webhook endpoints for all alert types, with customizable thresholds, retry logic, and payload formats.

What Are Webhooks?

Webhooks are a communication pattern where a server sends an HTTP POST request to a client-specified URL when an event occurs. The term was coined by Jeff Lindsay in 2007, combining "web" with "hook" (as in a programming hook — a point where custom code can be inserted into a process). Webhooks have become the standard mechanism for event-driven integrations across SaaS platforms, payment processors, version control systems, and monitoring tools.

The webhook lifecycle follows a simple pattern:

Registration: You provide the source system with a URL (your webhook endpoint) and specify which events you want to receive. For example, you might register https://api.yourcompany.com/webhooks/costhawk to receive cost.threshold.exceeded and anomaly.detected events.
Event occurrence: Something happens in the source system — a cost threshold is breached, a usage anomaly is detected, or a budget limit is approached.
Delivery: The source system constructs a JSON payload describing the event and sends an HTTP POST request to your registered URL. The request includes headers for authentication (typically an HMAC signature) and metadata.
Processing: Your endpoint receives the payload, verifies its authenticity, and executes whatever action is appropriate — posting to Slack, creating a Jira ticket, pausing an API key, or triggering a Lambda function.
Acknowledgment: Your endpoint returns an HTTP 200 status code to confirm receipt. If the source system receives a non-2xx response (or no response within a timeout), it retries the delivery according to its retry policy.

Webhooks are fundamentally different from APIs in their communication direction. With an API, your application pulls data by making requests on its own schedule. With a webhook, the source system pushes data to your application the moment it becomes relevant. This push model eliminates polling latency, reduces unnecessary API calls, and enables near-instantaneous response to events.

The key architectural advantage of webhooks is decoupling. The source system does not need to know what you do with the event data. It simply delivers the payload to your URL. You can change your processing logic — switching from Slack to Discord, adding a database write, or triggering an automated remediation — without any changes to the source system. This decoupling makes webhooks the most flexible and scalable integration pattern for event-driven workflows.

For AI cost management specifically, webhooks solve the fundamental timing problem: cost anomalies are time-sensitive events where every minute of delay increases financial exposure. Polling a dashboard every hour means you could miss up to 59 minutes of runaway spend. Webhooks deliver the alert within seconds of detection, compressing your response time from hours to minutes.

Webhooks in AI Cost Management

AI cost management generates several categories of events that benefit from real-time webhook delivery. Each event type serves a different operational purpose and typically routes to a different team or channel:

Cost Threshold Alerts: These fire when spending exceeds a defined dollar amount or percentage increase within a time window. Common configurations include:

Hourly spend exceeds 2x the trailing 7-day hourly average
Daily spend exceeds a fixed budget (e.g., $500/day)
Weekly spend exceeds 80% of the monthly budget with more than 7 days remaining
Per-key spend exceeds its allocated budget for the billing period

Cost threshold webhooks are the first line of defense against runaway spend. They should route to a high-visibility channel (Slack #cost-alerts or PagerDuty) and include the current spend amount, the threshold value, the percentage over threshold, and the affected resource (project, key, or model).

Anomaly Detection Alerts: These fire when CostHawk's anomaly detection algorithms identify statistically significant deviations from expected patterns. Unlike fixed thresholds, anomaly detection adapts to your baseline and catches subtle shifts that a static threshold might miss. Common anomaly types include:

Sudden spike in request volume from a single API key
Unusual increase in average tokens per request (suggesting prompt injection or context stuffing)
Unexpected model usage (requests hitting an expensive model that should be routed to a cheaper one)
Geographic or temporal anomalies (traffic from unexpected regions or at unusual hours)

Budget Warning Alerts: These provide advance warning before budgets are exhausted. Typical configurations include 50%, 75%, 90%, and 100% of budget consumption triggers. Budget warnings give teams time to adjust usage, request budget increases, or implement cost-saving measures before services are disrupted. CostHawk supports budgets at the organization, project, team, and individual API key level.

Usage Milestone Notifications: These fire when cumulative usage crosses defined thresholds — 1 million requests, 1 billion tokens, first usage of a new model, or first request from a new API key. Milestones are informational rather than urgent, typically routing to a general channel or weekly digest rather than paging an engineer.

System Health Events: These notify you of changes to the monitoring infrastructure itself — a wrapped key approaching its rate limit, a provider API returning elevated error rates, or a webhook endpoint failing to acknowledge deliveries. These meta-alerts ensure your monitoring pipeline stays healthy.

A mature AI cost monitoring setup typically configures 10–20 webhook rules across these categories, routing each to the appropriate channel and team based on urgency and ownership. CostHawk's webhook configuration interface lets you define rules declaratively, test them with sample payloads, and monitor delivery success rates from a single dashboard.

Webhook Delivery Platforms

Webhooks become actionable when they reach the platforms where your team already works. Here is how CostHawk webhook payloads integrate with the most popular notification platforms:

Platform	Integration Method	Payload Format	Best For	Typical Latency
Slack	Incoming Webhook URL or Slack App	Slack Block Kit JSON with rich formatting, buttons, and links	Team-wide cost alerts, daily digests, non-urgent notifications	< 1 second
Discord	Discord Webhook URL	Discord embed objects with color-coded severity	Developer teams, open-source projects, smaller teams	< 1 second
PagerDuty	Events API v2	PagerDuty alert payload with severity, routing key, and dedup key	Critical cost incidents requiring immediate human response	< 2 seconds
Microsoft Teams	Incoming Webhook connector or Power Automate	Adaptive Card JSON with sections, facts, and action buttons	Enterprise teams, organizations using Microsoft 365	< 2 seconds
Opsgenie	REST API integration	Opsgenie alert payload with priority and tags	On-call management, alert routing and escalation	< 2 seconds
Custom HTTP	Any HTTP endpoint	Standard CostHawk JSON payload	Internal dashboards, databases, serverless functions, custom automation	< 1 second
Email (via webhook)	SendGrid, Mailgun, or SES webhook relay	Formatted HTML email body	Executive summaries, compliance records, stakeholders without Slack	5–30 seconds
Zapier / Make	Zapier Webhook trigger or Make HTTP module	Standard JSON (Zapier parses automatically)	No-code automation, multi-step workflows, CRM updates	1–5 seconds

Platform selection guidelines:

For immediate human response (cost spikes, anomalies): Use PagerDuty or Opsgenie. These platforms provide on-call scheduling, escalation policies, and acknowledgment tracking that ensure critical alerts get seen and acted upon — even at 3 AM.
For team awareness (budget warnings, daily summaries): Use Slack or Teams. These reach the entire team in a shared channel without the urgency of a page. Configure CostHawk to send rich, formatted messages with charts and drill-down links.
For automation (auto-pause keys, auto-scale, auto-route): Use custom HTTP endpoints backed by serverless functions (AWS Lambda, Cloudflare Workers, or Vercel Edge Functions). These process the webhook payload programmatically and take automated action without human intervention.
For compliance and audit: Route webhooks to a logging endpoint that stores every alert in a durable, append-only log. This provides an audit trail for cost governance and can be queried during incident reviews.

Most teams configure multiple destinations per event type — for example, a cost anomaly might simultaneously notify Slack (for team awareness), PagerDuty (for on-call response), and a Lambda function (for automated key throttling). CostHawk supports fan-out delivery to multiple endpoints per webhook rule.

Webhook Payload Design

A well-designed webhook payload contains everything the receiver needs to understand and act on the event without making additional API calls. CostHawk webhook payloads follow a consistent structure across all event types:

{
  "id": "evt_01HZ3K7M9N2P4Q5R6S7T8U9V0W",
  "type": "cost.threshold.exceeded",
  "created_at": "2026-03-16T14:32:07.000Z",
  "api_version": "2026-03-01",
  "data": {
    "alert_id": "alt_01HZ3K7M9N2P4Q5R6S7T8U9V0W",
    "alert_name": "Daily spend exceeds $500",
    "severity": "critical",
    "resource": {
      "type": "project",
      "id": "proj_abc123",
      "name": "Production Chatbot"
    },
    "threshold": {
      "metric": "daily_cost_usd",
      "operator": "greater_than",
      "value": 500.00,
      "window": "24h"
    },
    "current_value": 743.21,
    "percent_over": 48.6,
    "breakdown": {
      "by_model": [
        { "model": "gpt-4o", "cost": 512.40, "requests": 34200 },
        { "model": "claude-3.5-sonnet", "cost": 189.30, "requests": 8400 },
        { "model": "gpt-4o-mini", "cost": 41.51, "requests": 92100 }
      ],
      "by_key": [
        { "key_id": "key_xyz789", "key_name": "chatbot-prod", "cost": 623.80 },
        { "key_id": "key_def456", "key_name": "search-prod", "cost": 119.41 }
      ]
    },
    "trend": {
      "previous_day": 312.50,
      "seven_day_average": 298.73,
      "percent_change_vs_average": 148.8
    },
    "dashboard_url": "https://app.costhawk.com/dashboard/alerts/alt_01HZ3K7M9N2P4Q5R6S7T8U9V0W"
  }
}

Key design principles in this payload:

Self-contained: The payload includes the alert name, threshold details, current value, breakdown, and trend data. A Slack integration can render a complete, actionable message without querying the CostHawk API for additional context.
Typed and versioned: The type field enables receivers to route different event types to different handlers. The api_version field ensures backward compatibility as the payload schema evolves.
Idempotent: The id field is a unique event identifier. Receivers should deduplicate on this field to handle retries gracefully — if the same event is delivered twice (due to a retry), processing it twice should not create duplicate alerts or actions.
Actionable: The dashboard_url provides a direct link to investigate the alert in the CostHawk UI. The breakdown object identifies which models and keys are driving the cost spike, enabling targeted response.
Severity-tagged: The severity field (critical, warning, info) lets receivers prioritize their response. A Slack integration might use red for critical, yellow for warning, and blue for info. A PagerDuty integration might only page for critical severity.

For anomaly detection events, the payload additionally includes the expected value, the standard deviation, and the z-score that triggered the anomaly, giving engineers the statistical context to assess whether the anomaly warrants action or is a benign fluctuation. For budget events, the payload includes the budget amount, consumed amount, remaining amount, projected exhaustion date, and burn rate.

Reliability and Retry Strategies

Webhook delivery operates over HTTP, which means network failures, endpoint downtime, and processing errors can all prevent successful delivery. A robust webhook system must handle these failures gracefully to ensure no critical alert is lost.

Retry policies: When a webhook delivery fails (non-2xx response or timeout), CostHawk retries the delivery using an exponential backoff schedule:

Attempt	Delay After Failure	Cumulative Time
1st retry	30 seconds	30 seconds
2nd retry	2 minutes	2.5 minutes
3rd retry	10 minutes	12.5 minutes
4th retry	30 minutes	42.5 minutes
5th retry	2 hours	2 hours 42 minutes
6th retry (final)	8 hours	10 hours 42 minutes

After 6 failed retries spanning approximately 11 hours, the delivery is marked as failed and the event is logged in the webhook delivery history. CostHawk sends a separate notification (via email or a backup channel) when a webhook endpoint has been failing consistently, so you know your alert pipeline is degraded.

Timeout handling: CostHawk waits up to 10 seconds for your endpoint to respond. If your endpoint does not return an HTTP response within this window, the attempt is treated as a failure and the retry schedule begins. To avoid timeouts, your webhook endpoint should acknowledge receipt immediately (return 200) and process the payload asynchronously. A common pattern is to write the payload to a queue (SQS, Redis, or a database) and return 200 within milliseconds, then process the event in a background worker.

Idempotency: Because retries can deliver the same event multiple times, your webhook handler must be idempotent — processing the same event twice should produce the same result as processing it once. Use the id field in the webhook payload as a deduplication key. Before processing, check whether you have already handled an event with this ID. If so, return 200 without taking any action.

Signature verification: Every CostHawk webhook delivery includes an X-CostHawk-Signature header containing an HMAC-SHA256 signature of the payload, computed using your webhook signing secret. Always verify this signature before processing the payload to prevent malicious actors from sending fake webhook events to your endpoint:

import crypto from 'crypto'

function verifyWebhookSignature(
  payload: string,
  signature: string,
  secret: string
): boolean {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex')
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  )
}

Monitoring webhook health: CostHawk tracks delivery success rate, average response time, and error codes for each webhook endpoint. If your endpoint's success rate drops below 95%, CostHawk flags it in the dashboard and sends a health alert. You can view delivery logs showing every attempt, the response code, response time, and payload size. This observability is critical for maintaining confidence that your alert pipeline is functioning correctly — the worst outcome is believing you are monitored when your webhook endpoint has been silently failing for days.

CostHawk Webhook Configuration

CostHawk provides a declarative webhook configuration system that lets you define, test, and monitor webhook endpoints from the dashboard or via the API. Here is how to set up a complete webhook-based alerting pipeline:

Step 1: Create a webhook endpoint. Navigate to Settings → Webhooks → Add Endpoint, or use the API:

POST /api/v1/webhooks
{
  "url": "https://api.yourcompany.com/webhooks/costhawk",
  "description": "Production cost alerts",
  "events": [
    "cost.threshold.exceeded",
    "anomaly.detected",
    "budget.warning",
    "budget.exhausted"
  ],
  "filters": {
    "projects": ["proj_abc123", "proj_def456"],
    "severity": ["critical", "warning"]
  }
}

CostHawk returns a webhook ID and a signing secret. Store the signing secret securely — you will use it to verify incoming payloads.

Step 2: Test the endpoint. Click "Send Test Event" in the dashboard or call the test endpoint. CostHawk sends a sample payload matching your configured event types so you can verify your endpoint receives, validates, and processes it correctly. The test payload includes a "test": true field so your handler can skip side effects during testing.

Step 3: Configure alert rules. Webhook endpoints receive events generated by alert rules. Configure rules in the Alerts section:

Cost threshold rules: Define a metric (hourly, daily, or weekly spend), a comparison operator (greater than, percent increase over baseline), and a threshold value. Example: "Alert when daily spend for project Production Chatbot exceeds $500."
Anomaly detection rules: Enable CostHawk's anomaly detection engine, which automatically establishes baselines and detects statistically significant deviations. Configure sensitivity (low, medium, high) to control the z-score threshold for triggering.
Budget rules: Attach webhooks to budget milestones. Example: "Alert at 50%, 75%, 90%, and 100% of the $10,000 monthly budget for the Engineering team."

Step 4: Monitor delivery health. The Webhooks dashboard shows real-time delivery metrics for each endpoint:

Success rate: Percentage of deliveries that received a 2xx response within the timeout window. Target: 99.5%+
Average response time: How quickly your endpoint acknowledges deliveries. Target: under 500ms
Recent failures: List of failed deliveries with response codes, timestamps, and retry status
Event volume: Number of events delivered per hour/day, useful for understanding alert frequency and tuning thresholds to avoid alert fatigue

You can also use the CostHawk MCP server to manage webhooks from within your AI coding assistant. The costhawk_create_webhook and costhawk_list_webhooks tools let you configure and monitor webhooks without leaving your editor — a natural fit for engineering teams that live in their terminal.

Best practices for webhook configuration:

Create separate endpoints for different severity levels — route critical alerts to PagerDuty and warnings to Slack
Use filters to avoid noise — only subscribe to events for projects and severity levels you care about
Set up a catch-all endpoint that logs all events to a database for audit and retrospective analysis
Review and tune thresholds monthly — as your usage grows, static thresholds may need adjustment
Test webhook endpoints after any infrastructure change (DNS, load balancer, firewall rules) to confirm continued connectivity

FAQ

Frequently Asked Questions

What is the difference between a webhook and polling an API?+

Polling means your application repeatedly calls an API endpoint on a schedule — every 30 seconds, every minute, every 5 minutes — to check whether anything has changed. A webhook inverts this: the source system pushes a notification to your endpoint the moment an event occurs. The practical differences are significant. Polling introduces latency equal to your polling interval — if you poll every 5 minutes, an event might go unnoticed for up to 5 minutes. Webhooks deliver within seconds. Polling generates unnecessary API calls when nothing has changed — 95% of polling requests typically return 'no new data,' wasting compute and counting against rate limits. Webhooks only fire when something actually happens. Polling requires your application to run continuously and manage state (tracking what it has already seen). Webhook receivers can be stateless serverless functions that process each event independently. The tradeoff is that webhooks require you to expose an HTTP endpoint, handle retries and idempotency, and verify signatures. For cost alerting specifically, the latency advantage of webhooks is decisive: a 5-minute polling delay during a cost spike could mean hundreds of dollars in unnecessary spend that real-time webhook delivery would have prevented.

How do I secure my webhook endpoint against spoofed requests?+

Webhook security requires multiple layers to prevent unauthorized actors from sending fake events to your endpoint. The primary mechanism is HMAC signature verification. CostHawk signs every webhook payload with your unique signing secret using HMAC-SHA256. When a request arrives, compute the HMAC of the raw request body using your stored signing secret and compare it to the signature in the X-CostHawk-Signature header. Use a constant-time comparison function (like Node.js crypto.timingSafeEqual) to prevent timing attacks. Beyond signature verification, implement these additional safeguards: (1) Validate the Content-Type header is application/json. (2) Parse the payload and verify the api_version matches expected values. (3) Check the created_at timestamp is within an acceptable window (e.g., last 5 minutes) to prevent replay attacks. (4) Restrict your endpoint's firewall to CostHawk's published IP ranges if your infrastructure supports IP allowlisting. (5) Use HTTPS exclusively — never accept webhooks over plain HTTP. (6) Rotate your signing secret periodically and support dual-secret validation during rotation to avoid delivery gaps.

What happens if my webhook endpoint is down when an alert fires?+

CostHawk implements an automatic retry policy with exponential backoff to handle endpoint downtime. When a delivery attempt fails (timeout, connection refused, or non-2xx response), CostHawk retries up to 6 times over approximately 11 hours: the first retry at 30 seconds, then 2 minutes, 10 minutes, 30 minutes, 2 hours, and finally 8 hours. Each retry attempt is logged with the response code and error details, visible in the Webhooks dashboard under the delivery history for that endpoint. If all 6 retry attempts fail, the event is permanently marked as failed — but it is never deleted. You can view all failed deliveries in the dashboard and manually replay them once your endpoint is restored. CostHawk also sends a separate meta-alert (via email or a backup webhook endpoint, if configured) when an endpoint's failure rate exceeds 5%, so you know your alerting pipeline is degraded even if your primary notification channel is the one that is down. For critical alerting, configure at least two webhook endpoints on different infrastructure — for example, Slack plus PagerDuty — so that a single endpoint failure does not leave you blind to cost events.

How do I avoid alert fatigue from too many webhook notifications?+

Alert fatigue occurs when teams receive so many notifications that they start ignoring them — a dangerous outcome for cost monitoring. CostHawk provides several mechanisms to keep webhook volume manageable. First, tune your thresholds based on actual data. If your daily spend normally fluctuates between $200–$400, setting a threshold at $250 will trigger constantly. Set it at $500 (2x average) or use percentage-based thresholds that adapt to your baseline. Second, use severity-based routing: critical alerts (budget exhausted, anomaly detected) go to PagerDuty; warnings (approaching threshold) go to Slack; informational events (usage milestones) go to a weekly email digest. Third, enable cooldown periods — configure a minimum interval between repeated firings of the same alert rule. If a cost threshold is breached and stays breached, you likely want one alert, not one every 15 minutes. CostHawk supports cooldowns from 15 minutes to 24 hours per alert rule. Fourth, use alert grouping to batch related events. If five API keys simultaneously exceed their budgets because of a shared upstream issue, CostHawk groups them into a single webhook delivery with all five keys listed. Finally, review alert rules quarterly and remove or adjust rules that fire frequently without leading to action — those are noise, not signal.

Can I use webhooks to automatically pause API keys when costs spike?+

Yes — automated remediation via webhooks is one of the most powerful cost-protection strategies available. The pattern works as follows: CostHawk detects a cost anomaly or threshold breach and fires a webhook to a serverless function (AWS Lambda, Cloudflare Worker, or Vercel Edge Function). The function receives the payload, validates the signature, evaluates the severity, and if the conditions warrant it, calls the CostHawk API to pause the offending API key. The paused key immediately stops proxying requests, returning a 429 status code to callers. Here is a simplified example of an auto-pause Lambda function: receive the webhook, check if severity is 'critical' and the cost exceeds 3x the daily average, then call POST /api/v1/keys/{key_id}/pause. Important safeguards to implement: (1) Never auto-pause keys tagged as 'production-critical' without human approval — send a PagerDuty page instead. (2) Include a manual override mechanism so on-call engineers can unpause keys immediately. (3) Log every automated action for audit purposes. (4) Test the automation thoroughly in a staging environment before enabling it for production keys. (5) Set up a confirmation webhook that notifies your team whenever a key is auto-paused, so humans are always aware of automated actions. CostHawk's webhook payloads include enough context (key ID, project, model breakdown) to make informed automated decisions without additional API calls.

What payload format do CostHawk webhooks use?+

CostHawk webhooks deliver JSON payloads over HTTP POST with a Content-Type: application/json header. Every payload follows a consistent envelope structure with four top-level fields: id (unique event identifier for idempotency), type (event type string like cost.threshold.exceeded, anomaly.detected, budget.warning, or budget.exhausted), created_at (ISO 8601 timestamp), and data (event-specific payload). The data object varies by event type but always includes alert_id, alert_name, severity (critical, warning, or info), resource (the affected project, key, or organization), and dashboard_url (a direct link to investigate in the CostHawk UI). For cost threshold events, the data includes threshold configuration, current_value, percent_over, and a breakdown object with per-model and per-key cost attribution. For anomaly events, it includes expected_value, actual_value, z_score, and anomaly_type. Payloads are typically 1–3 KB, well within the limits of any HTTP receiver. The schema is versioned via the api_version field, and CostHawk maintains backward compatibility within major versions so existing integrations continue working when new fields are added.

How do I test webhooks during development?+

Testing webhooks locally requires bridging the gap between CostHawk's cloud infrastructure and your development machine. The most common approach is to use a tunneling service that gives your local server a public URL. Tools like ngrok, Cloudflare Tunnel, or localtunnel create a secure tunnel from a public endpoint to your local port. Run ngrok http 3000 to get a URL like https://abc123.ngrok.io, then register that URL as a webhook endpoint in CostHawk. Incoming webhooks will be forwarded to your local development server. CostHawk also provides a built-in testing workflow: click "Send Test Event" on any webhook endpoint in the dashboard, and CostHawk delivers a realistic sample payload matching your configured event types. The test payload includes a "test": true flag so your handler can distinguish test events from real ones. For automated testing in CI/CD pipelines, use CostHawk's webhook payload schema to generate mock payloads and feed them directly to your handler function without involving the network layer at all. Finally, the CostHawk dashboard includes a real-time delivery log that shows the exact request headers, request body, response code, and response time for every delivery attempt — invaluable for debugging payload parsing issues, signature verification problems, or unexpected response codes.

How many webhook endpoints can I configure in CostHawk?+

CostHawk supports up to 25 webhook endpoints per organization on the Pro plan and up to 100 endpoints on the Enterprise plan. Each endpoint can subscribe to any combination of event types and apply filters for specific projects, API keys, severity levels, or models. There is no limit on the number of event types per endpoint — a single endpoint can receive all event types if desired, or you can create dedicated endpoints for specific event categories. CostHawk supports fan-out delivery, meaning a single event can be delivered to multiple endpoints simultaneously. If you have both a Slack webhook and a PagerDuty webhook subscribed to the same event type, both receive the payload independently with their own retry tracking. Delivery rate limits are generous: CostHawk can deliver up to 1,000 webhook events per minute per organization, with burst capacity for alert storms. If your alert rules generate more events than this limit, CostHawk batches them into grouped payloads to stay within the rate limit while ensuring no events are lost. For organizations with complex routing needs, CostHawk also supports webhook transformations — custom payload templates that reshape the standard payload to match your destination's expected format, eliminating the need for an intermediate relay service.

Related Terms

Alerting

Automated notifications triggered by cost thresholds, usage anomalies, or performance degradation in AI systems. The first line of defense against budget overruns — alerting ensures no cost spike goes unnoticed.

Cost Anomaly Detection

Automated detection of unusual AI spending patterns — sudden spikes, gradual drift, and per-key anomalies — before they become budget-breaking surprises.

Dashboards

Visual interfaces for monitoring AI cost, usage, and performance metrics in real-time. The command center for AI cost management — dashboards aggregate token spend, model utilization, latency, and budget health into a single pane of glass.

Token Budget

Spending limits applied per project, team, or time period to prevent uncontrolled AI API costs and protect against runaway agents.

Logging

Recording LLM request and response metadata — tokens consumed, model used, latency, cost, and status — for debugging, cost analysis, and compliance. Effective LLM logging captures the operational envelope of every API call without storing sensitive prompt content.

LLM Observability

The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that matters: token consumption, cost, latency, error rates, and output quality. LLM observability goes far beyond traditional APM by tracking AI-specific metrics that determine both the reliability and the economics of your AI features.

AI Cost Glossary

Put this knowledge to work. Track your AI spend in one place.

CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.

Get started free Back to Glossary