Webhook
An HTTP callback that pushes real-time notifications when events occur — cost threshold breaches, anomaly detection alerts, usage milestones. Webhooks are the delivery mechanism that turns passive monitoring into active, automated response workflows across Slack, PagerDuty, Discord, and any HTTP endpoint.
Definition
What is Webhook?
Impact
Why It Matters for AI Costs
AI API costs can spike dramatically and without warning. A single misconfigured batch job, a prompt injection that triggers infinite loops, or a sudden traffic surge can turn a $50/day workload into a $5,000/day emergency in under an hour. Without real-time notification, these cost events go undetected until someone reviews a billing dashboard — often days later.
Consider the timeline of a typical cost incident without webhooks:
- Hour 0: A deployment introduces a bug that retries failed API calls indefinitely
- Hours 0–8: The bug runs overnight, consuming 50x normal token volume
- Hour 24: An engineer notices the daily cost report looks high
- Hour 26: The team investigates, identifies the bug, and deploys a fix
- Total damage: $12,000 in unnecessary API spend over 26 hours
Now consider the same incident with webhooks configured:
- Hour 0: The same bug is deployed
- Minute 15: CostHawk detects hourly spend exceeding the 2x threshold and fires a webhook
- Minute 16: Slack alert reaches the #cost-alerts channel; PagerDuty pages the on-call engineer
- Minute 30: The engineer identifies the bug and rolls back the deployment
- Total damage: $150 in unnecessary API spend over 30 minutes
That is an 80x reduction in cost impact — from $12,000 to $150 — purely from having real-time webhook notifications. Webhooks transform cost monitoring from a retrospective reporting exercise into a real-time operational capability. They are not optional infrastructure for any team spending more than $1,000/month on AI APIs. CostHawk supports configurable webhook endpoints for all alert types, with customizable thresholds, retry logic, and payload formats.
What Are Webhooks?
Webhooks are a communication pattern where a server sends an HTTP POST request to a client-specified URL when an event occurs. The term was coined by Jeff Lindsay in 2007, combining "web" with "hook" (as in a programming hook — a point where custom code can be inserted into a process). Webhooks have become the standard mechanism for event-driven integrations across SaaS platforms, payment processors, version control systems, and monitoring tools.
The webhook lifecycle follows a simple pattern:
- Registration: You provide the source system with a URL (your webhook endpoint) and specify which events you want to receive. For example, you might register
https://api.yourcompany.com/webhooks/costhawkto receivecost.threshold.exceededandanomaly.detectedevents. - Event occurrence: Something happens in the source system — a cost threshold is breached, a usage anomaly is detected, or a budget limit is approached.
- Delivery: The source system constructs a JSON payload describing the event and sends an HTTP POST request to your registered URL. The request includes headers for authentication (typically an HMAC signature) and metadata.
- Processing: Your endpoint receives the payload, verifies its authenticity, and executes whatever action is appropriate — posting to Slack, creating a Jira ticket, pausing an API key, or triggering a Lambda function.
- Acknowledgment: Your endpoint returns an HTTP 200 status code to confirm receipt. If the source system receives a non-2xx response (or no response within a timeout), it retries the delivery according to its retry policy.
Webhooks are fundamentally different from APIs in their communication direction. With an API, your application pulls data by making requests on its own schedule. With a webhook, the source system pushes data to your application the moment it becomes relevant. This push model eliminates polling latency, reduces unnecessary API calls, and enables near-instantaneous response to events.
The key architectural advantage of webhooks is decoupling. The source system does not need to know what you do with the event data. It simply delivers the payload to your URL. You can change your processing logic — switching from Slack to Discord, adding a database write, or triggering an automated remediation — without any changes to the source system. This decoupling makes webhooks the most flexible and scalable integration pattern for event-driven workflows.
For AI cost management specifically, webhooks solve the fundamental timing problem: cost anomalies are time-sensitive events where every minute of delay increases financial exposure. Polling a dashboard every hour means you could miss up to 59 minutes of runaway spend. Webhooks deliver the alert within seconds of detection, compressing your response time from hours to minutes.
Webhooks in AI Cost Management
AI cost management generates several categories of events that benefit from real-time webhook delivery. Each event type serves a different operational purpose and typically routes to a different team or channel:
Cost Threshold Alerts: These fire when spending exceeds a defined dollar amount or percentage increase within a time window. Common configurations include:
- Hourly spend exceeds 2x the trailing 7-day hourly average
- Daily spend exceeds a fixed budget (e.g., $500/day)
- Weekly spend exceeds 80% of the monthly budget with more than 7 days remaining
- Per-key spend exceeds its allocated budget for the billing period
Cost threshold webhooks are the first line of defense against runaway spend. They should route to a high-visibility channel (Slack #cost-alerts or PagerDuty) and include the current spend amount, the threshold value, the percentage over threshold, and the affected resource (project, key, or model).
Anomaly Detection Alerts: These fire when CostHawk's anomaly detection algorithms identify statistically significant deviations from expected patterns. Unlike fixed thresholds, anomaly detection adapts to your baseline and catches subtle shifts that a static threshold might miss. Common anomaly types include:
- Sudden spike in request volume from a single API key
- Unusual increase in average tokens per request (suggesting prompt injection or context stuffing)
- Unexpected model usage (requests hitting an expensive model that should be routed to a cheaper one)
- Geographic or temporal anomalies (traffic from unexpected regions or at unusual hours)
Budget Warning Alerts: These provide advance warning before budgets are exhausted. Typical configurations include 50%, 75%, 90%, and 100% of budget consumption triggers. Budget warnings give teams time to adjust usage, request budget increases, or implement cost-saving measures before services are disrupted. CostHawk supports budgets at the organization, project, team, and individual API key level.
Usage Milestone Notifications: These fire when cumulative usage crosses defined thresholds — 1 million requests, 1 billion tokens, first usage of a new model, or first request from a new API key. Milestones are informational rather than urgent, typically routing to a general channel or weekly digest rather than paging an engineer.
System Health Events: These notify you of changes to the monitoring infrastructure itself — a wrapped key approaching its rate limit, a provider API returning elevated error rates, or a webhook endpoint failing to acknowledge deliveries. These meta-alerts ensure your monitoring pipeline stays healthy.
A mature AI cost monitoring setup typically configures 10–20 webhook rules across these categories, routing each to the appropriate channel and team based on urgency and ownership. CostHawk's webhook configuration interface lets you define rules declaratively, test them with sample payloads, and monitor delivery success rates from a single dashboard.
Webhook Delivery Platforms
Webhooks become actionable when they reach the platforms where your team already works. Here is how CostHawk webhook payloads integrate with the most popular notification platforms:
| Platform | Integration Method | Payload Format | Best For | Typical Latency |
|---|---|---|---|---|
| Slack | Incoming Webhook URL or Slack App | Slack Block Kit JSON with rich formatting, buttons, and links | Team-wide cost alerts, daily digests, non-urgent notifications | < 1 second |
| Discord | Discord Webhook URL | Discord embed objects with color-coded severity | Developer teams, open-source projects, smaller teams | < 1 second |
| PagerDuty | Events API v2 | PagerDuty alert payload with severity, routing key, and dedup key | Critical cost incidents requiring immediate human response | < 2 seconds |
| Microsoft Teams | Incoming Webhook connector or Power Automate | Adaptive Card JSON with sections, facts, and action buttons | Enterprise teams, organizations using Microsoft 365 | < 2 seconds |
| Opsgenie | REST API integration | Opsgenie alert payload with priority and tags | On-call management, alert routing and escalation | < 2 seconds |
| Custom HTTP | Any HTTP endpoint | Standard CostHawk JSON payload | Internal dashboards, databases, serverless functions, custom automation | < 1 second |
| Email (via webhook) | SendGrid, Mailgun, or SES webhook relay | Formatted HTML email body | Executive summaries, compliance records, stakeholders without Slack | 5–30 seconds |
| Zapier / Make | Zapier Webhook trigger or Make HTTP module | Standard JSON (Zapier parses automatically) | No-code automation, multi-step workflows, CRM updates | 1–5 seconds |
Platform selection guidelines:
- For immediate human response (cost spikes, anomalies): Use PagerDuty or Opsgenie. These platforms provide on-call scheduling, escalation policies, and acknowledgment tracking that ensure critical alerts get seen and acted upon — even at 3 AM.
- For team awareness (budget warnings, daily summaries): Use Slack or Teams. These reach the entire team in a shared channel without the urgency of a page. Configure CostHawk to send rich, formatted messages with charts and drill-down links.
- For automation (auto-pause keys, auto-scale, auto-route): Use custom HTTP endpoints backed by serverless functions (AWS Lambda, Cloudflare Workers, or Vercel Edge Functions). These process the webhook payload programmatically and take automated action without human intervention.
- For compliance and audit: Route webhooks to a logging endpoint that stores every alert in a durable, append-only log. This provides an audit trail for cost governance and can be queried during incident reviews.
Most teams configure multiple destinations per event type — for example, a cost anomaly might simultaneously notify Slack (for team awareness), PagerDuty (for on-call response), and a Lambda function (for automated key throttling). CostHawk supports fan-out delivery to multiple endpoints per webhook rule.
Webhook Payload Design
A well-designed webhook payload contains everything the receiver needs to understand and act on the event without making additional API calls. CostHawk webhook payloads follow a consistent structure across all event types:
{
"id": "evt_01HZ3K7M9N2P4Q5R6S7T8U9V0W",
"type": "cost.threshold.exceeded",
"created_at": "2026-03-16T14:32:07.000Z",
"api_version": "2026-03-01",
"data": {
"alert_id": "alt_01HZ3K7M9N2P4Q5R6S7T8U9V0W",
"alert_name": "Daily spend exceeds $500",
"severity": "critical",
"resource": {
"type": "project",
"id": "proj_abc123",
"name": "Production Chatbot"
},
"threshold": {
"metric": "daily_cost_usd",
"operator": "greater_than",
"value": 500.00,
"window": "24h"
},
"current_value": 743.21,
"percent_over": 48.6,
"breakdown": {
"by_model": [
{ "model": "gpt-4o", "cost": 512.40, "requests": 34200 },
{ "model": "claude-3.5-sonnet", "cost": 189.30, "requests": 8400 },
{ "model": "gpt-4o-mini", "cost": 41.51, "requests": 92100 }
],
"by_key": [
{ "key_id": "key_xyz789", "key_name": "chatbot-prod", "cost": 623.80 },
{ "key_id": "key_def456", "key_name": "search-prod", "cost": 119.41 }
]
},
"trend": {
"previous_day": 312.50,
"seven_day_average": 298.73,
"percent_change_vs_average": 148.8
},
"dashboard_url": "https://app.costhawk.com/dashboard/alerts/alt_01HZ3K7M9N2P4Q5R6S7T8U9V0W"
}
}Key design principles in this payload:
- Self-contained: The payload includes the alert name, threshold details, current value, breakdown, and trend data. A Slack integration can render a complete, actionable message without querying the CostHawk API for additional context.
- Typed and versioned: The
typefield enables receivers to route different event types to different handlers. Theapi_versionfield ensures backward compatibility as the payload schema evolves. - Idempotent: The
idfield is a unique event identifier. Receivers should deduplicate on this field to handle retries gracefully — if the same event is delivered twice (due to a retry), processing it twice should not create duplicate alerts or actions. - Actionable: The
dashboard_urlprovides a direct link to investigate the alert in the CostHawk UI. Thebreakdownobject identifies which models and keys are driving the cost spike, enabling targeted response. - Severity-tagged: The
severityfield (critical, warning, info) lets receivers prioritize their response. A Slack integration might use red for critical, yellow for warning, and blue for info. A PagerDuty integration might only page for critical severity.
For anomaly detection events, the payload additionally includes the expected value, the standard deviation, and the z-score that triggered the anomaly, giving engineers the statistical context to assess whether the anomaly warrants action or is a benign fluctuation. For budget events, the payload includes the budget amount, consumed amount, remaining amount, projected exhaustion date, and burn rate.
Reliability and Retry Strategies
Webhook delivery operates over HTTP, which means network failures, endpoint downtime, and processing errors can all prevent successful delivery. A robust webhook system must handle these failures gracefully to ensure no critical alert is lost.
Retry policies: When a webhook delivery fails (non-2xx response or timeout), CostHawk retries the delivery using an exponential backoff schedule:
| Attempt | Delay After Failure | Cumulative Time |
|---|---|---|
| 1st retry | 30 seconds | 30 seconds |
| 2nd retry | 2 minutes | 2.5 minutes |
| 3rd retry | 10 minutes | 12.5 minutes |
| 4th retry | 30 minutes | 42.5 minutes |
| 5th retry | 2 hours | 2 hours 42 minutes |
| 6th retry (final) | 8 hours | 10 hours 42 minutes |
After 6 failed retries spanning approximately 11 hours, the delivery is marked as failed and the event is logged in the webhook delivery history. CostHawk sends a separate notification (via email or a backup channel) when a webhook endpoint has been failing consistently, so you know your alert pipeline is degraded.
Timeout handling: CostHawk waits up to 10 seconds for your endpoint to respond. If your endpoint does not return an HTTP response within this window, the attempt is treated as a failure and the retry schedule begins. To avoid timeouts, your webhook endpoint should acknowledge receipt immediately (return 200) and process the payload asynchronously. A common pattern is to write the payload to a queue (SQS, Redis, or a database) and return 200 within milliseconds, then process the event in a background worker.
Idempotency: Because retries can deliver the same event multiple times, your webhook handler must be idempotent — processing the same event twice should produce the same result as processing it once. Use the id field in the webhook payload as a deduplication key. Before processing, check whether you have already handled an event with this ID. If so, return 200 without taking any action.
Signature verification: Every CostHawk webhook delivery includes an X-CostHawk-Signature header containing an HMAC-SHA256 signature of the payload, computed using your webhook signing secret. Always verify this signature before processing the payload to prevent malicious actors from sending fake webhook events to your endpoint:
import crypto from 'crypto'
function verifyWebhookSignature(
payload: string,
signature: string,
secret: string
): boolean {
const expected = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex')
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
)
}Monitoring webhook health: CostHawk tracks delivery success rate, average response time, and error codes for each webhook endpoint. If your endpoint's success rate drops below 95%, CostHawk flags it in the dashboard and sends a health alert. You can view delivery logs showing every attempt, the response code, response time, and payload size. This observability is critical for maintaining confidence that your alert pipeline is functioning correctly — the worst outcome is believing you are monitored when your webhook endpoint has been silently failing for days.
CostHawk Webhook Configuration
CostHawk provides a declarative webhook configuration system that lets you define, test, and monitor webhook endpoints from the dashboard or via the API. Here is how to set up a complete webhook-based alerting pipeline:
Step 1: Create a webhook endpoint. Navigate to Settings → Webhooks → Add Endpoint, or use the API:
POST /api/v1/webhooks
{
"url": "https://api.yourcompany.com/webhooks/costhawk",
"description": "Production cost alerts",
"events": [
"cost.threshold.exceeded",
"anomaly.detected",
"budget.warning",
"budget.exhausted"
],
"filters": {
"projects": ["proj_abc123", "proj_def456"],
"severity": ["critical", "warning"]
}
}CostHawk returns a webhook ID and a signing secret. Store the signing secret securely — you will use it to verify incoming payloads.
Step 2: Test the endpoint. Click "Send Test Event" in the dashboard or call the test endpoint. CostHawk sends a sample payload matching your configured event types so you can verify your endpoint receives, validates, and processes it correctly. The test payload includes a "test": true field so your handler can skip side effects during testing.
Step 3: Configure alert rules. Webhook endpoints receive events generated by alert rules. Configure rules in the Alerts section:
- Cost threshold rules: Define a metric (hourly, daily, or weekly spend), a comparison operator (greater than, percent increase over baseline), and a threshold value. Example: "Alert when daily spend for project Production Chatbot exceeds $500."
- Anomaly detection rules: Enable CostHawk's anomaly detection engine, which automatically establishes baselines and detects statistically significant deviations. Configure sensitivity (low, medium, high) to control the z-score threshold for triggering.
- Budget rules: Attach webhooks to budget milestones. Example: "Alert at 50%, 75%, 90%, and 100% of the $10,000 monthly budget for the Engineering team."
Step 4: Monitor delivery health. The Webhooks dashboard shows real-time delivery metrics for each endpoint:
- Success rate: Percentage of deliveries that received a 2xx response within the timeout window. Target: 99.5%+
- Average response time: How quickly your endpoint acknowledges deliveries. Target: under 500ms
- Recent failures: List of failed deliveries with response codes, timestamps, and retry status
- Event volume: Number of events delivered per hour/day, useful for understanding alert frequency and tuning thresholds to avoid alert fatigue
You can also use the CostHawk MCP server to manage webhooks from within your AI coding assistant. The costhawk_create_webhook and costhawk_list_webhooks tools let you configure and monitor webhooks without leaving your editor — a natural fit for engineering teams that live in their terminal.
Best practices for webhook configuration:
- Create separate endpoints for different severity levels — route critical alerts to PagerDuty and warnings to Slack
- Use filters to avoid noise — only subscribe to events for projects and severity levels you care about
- Set up a catch-all endpoint that logs all events to a database for audit and retrospective analysis
- Review and tune thresholds monthly — as your usage grows, static thresholds may need adjustment
- Test webhook endpoints after any infrastructure change (DNS, load balancer, firewall rules) to confirm continued connectivity
FAQ
Frequently Asked Questions
What is the difference between a webhook and polling an API?+
How do I secure my webhook endpoint against spoofed requests?+
X-CostHawk-Signature header. Use a constant-time comparison function (like Node.js crypto.timingSafeEqual) to prevent timing attacks. Beyond signature verification, implement these additional safeguards: (1) Validate the Content-Type header is application/json. (2) Parse the payload and verify the api_version matches expected values. (3) Check the created_at timestamp is within an acceptable window (e.g., last 5 minutes) to prevent replay attacks. (4) Restrict your endpoint's firewall to CostHawk's published IP ranges if your infrastructure supports IP allowlisting. (5) Use HTTPS exclusively — never accept webhooks over plain HTTP. (6) Rotate your signing secret periodically and support dual-secret validation during rotation to avoid delivery gaps.What happens if my webhook endpoint is down when an alert fires?+
How do I avoid alert fatigue from too many webhook notifications?+
Can I use webhooks to automatically pause API keys when costs spike?+
POST /api/v1/keys/{key_id}/pause. Important safeguards to implement: (1) Never auto-pause keys tagged as 'production-critical' without human approval — send a PagerDuty page instead. (2) Include a manual override mechanism so on-call engineers can unpause keys immediately. (3) Log every automated action for audit purposes. (4) Test the automation thoroughly in a staging environment before enabling it for production keys. (5) Set up a confirmation webhook that notifies your team whenever a key is auto-paused, so humans are always aware of automated actions. CostHawk's webhook payloads include enough context (key ID, project, model breakdown) to make informed automated decisions without additional API calls.What payload format do CostHawk webhooks use?+
Content-Type: application/json header. Every payload follows a consistent envelope structure with four top-level fields: id (unique event identifier for idempotency), type (event type string like cost.threshold.exceeded, anomaly.detected, budget.warning, or budget.exhausted), created_at (ISO 8601 timestamp), and data (event-specific payload). The data object varies by event type but always includes alert_id, alert_name, severity (critical, warning, or info), resource (the affected project, key, or organization), and dashboard_url (a direct link to investigate in the CostHawk UI). For cost threshold events, the data includes threshold configuration, current_value, percent_over, and a breakdown object with per-model and per-key cost attribution. For anomaly events, it includes expected_value, actual_value, z_score, and anomaly_type. Payloads are typically 1–3 KB, well within the limits of any HTTP receiver. The schema is versioned via the api_version field, and CostHawk maintains backward compatibility within major versions so existing integrations continue working when new fields are added.How do I test webhooks during development?+
ngrok http 3000 to get a URL like https://abc123.ngrok.io, then register that URL as a webhook endpoint in CostHawk. Incoming webhooks will be forwarded to your local development server. CostHawk also provides a built-in testing workflow: click "Send Test Event" on any webhook endpoint in the dashboard, and CostHawk delivers a realistic sample payload matching your configured event types. The test payload includes a "test": true flag so your handler can distinguish test events from real ones. For automated testing in CI/CD pipelines, use CostHawk's webhook payload schema to generate mock payloads and feed them directly to your handler function without involving the network layer at all. Finally, the CostHawk dashboard includes a real-time delivery log that shows the exact request headers, request body, response code, and response time for every delivery attempt — invaluable for debugging payload parsing issues, signature verification problems, or unexpected response codes.How many webhook endpoints can I configure in CostHawk?+
Related Terms
Alerting
Automated notifications triggered by cost thresholds, usage anomalies, or performance degradation in AI systems. The first line of defense against budget overruns — alerting ensures no cost spike goes unnoticed.
Read moreCost Anomaly Detection
Automated detection of unusual AI spending patterns — sudden spikes, gradual drift, and per-key anomalies — before they become budget-breaking surprises.
Read moreDashboards
Visual interfaces for monitoring AI cost, usage, and performance metrics in real-time. The command center for AI cost management — dashboards aggregate token spend, model utilization, latency, and budget health into a single pane of glass.
Read moreToken Budget
Spending limits applied per project, team, or time period to prevent uncontrolled AI API costs and protect against runaway agents.
Read moreLogging
Recording LLM request and response metadata — tokens consumed, model used, latency, cost, and status — for debugging, cost analysis, and compliance. Effective LLM logging captures the operational envelope of every API call without storing sensitive prompt content.
Read moreLLM Observability
The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that matters: token consumption, cost, latency, error rates, and output quality. LLM observability goes far beyond traditional APM by tracking AI-specific metrics that determine both the reliability and the economics of your AI features.
Read moreAI Cost Glossary
Put this knowledge to work. Track your AI spend in one place.
CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.
