Model Context Protocol (MCP)
An open protocol for connecting AI assistants to external tools and data sources via a standardized client-server architecture. MCP enables AI coding assistants like Claude Code and GitHub Copilot to query cost data, run analyses, set budgets, and take actions without leaving the development environment.
Definition
What is Model Context Protocol (MCP)?
Impact
Why It Matters for AI Costs
The way engineering teams interact with their tools is fundamentally changing. AI coding assistants — Claude Code, GitHub Copilot, Cursor, Windsurf, Cline — are becoming the primary interface for development workflows. Engineers increasingly ask their AI assistant to check deployment status, review pull requests, query databases, and manage infrastructure rather than switching between browser tabs and terminal windows.
MCP makes this possible by providing a standard protocol for tool access. Without MCP, every AI assistant would need custom integrations with every tool — Anthropic building a Slack connector, OpenAI building a Jira connector, each vendor reinventing the wheel. With MCP, a tool vendor builds one MCP server, and it works with every MCP-compatible AI assistant.
For AI cost management, this shift has profound implications:
- Context-aware cost queries: An engineer working on a feature can ask "How much did this feature cost in the last week?" and get an answer within their coding environment, with the AI assistant using MCP tools to query CostHawk's data. No tab switching, no dashboard navigation, no context loss.
- Proactive cost awareness: AI assistants can automatically check cost impact before suggesting expensive operations. "Before I recommend using Claude Opus for this task, let me check your budget" — the assistant invokes CostHawk's MCP tools to verify budget availability.
- Inline optimization: When reviewing code that makes LLM API calls, an AI assistant with CostHawk MCP access can analyze the actual cost data for those calls and suggest optimizations based on real production numbers, not theoretical estimates.
- Operational agility: During an incident, an engineer can ask their AI assistant to "check if there's a cost anomaly in the last hour" and get an immediate answer without logging into a dashboard. The assistant calls
costhawk_detect_anomaliesand presents the results inline.
MCP transforms cost monitoring from a destination (a dashboard you visit) into a capability (data you access wherever you work). This ambient availability means cost data gets consulted more frequently, leading to better cost decisions across the engineering organization. CostHawk's MCP server makes this concrete: install it once, and every AI assistant in your organization gains the ability to query costs, generate reports, and manage budgets.
What is MCP?
The Model Context Protocol is a specification for structured communication between AI assistants and external services. Think of it as the USB-C of AI tool integration: a universal connector that lets any AI assistant work with any tool, as long as both speak the MCP protocol.
MCP was created to solve a specific problem: the N×M integration problem. If there are N AI assistants and M tools, building custom integrations requires N×M connectors. With a standard protocol, each AI assistant implements the MCP client (N implementations) and each tool implements the MCP server (M implementations), and every combination works — N+M total implementations instead of N×M.
The protocol defines three core primitives:
- Tools: Executable functions that the AI assistant can invoke. A tool has a name, a description (used by the AI to decide when to invoke it), and a JSON Schema defining its parameters. When invoked, the tool executes server-side and returns a result. Example:
costhawk_get_usage_summary— takes a date range and returns total spend, token counts, and request volume. - Resources: Read-only data that the AI assistant can access. Resources are identified by URIs and return content that the assistant can incorporate into its context. Example:
costhawk://reports/monthly/2026-02— returns a formatted monthly cost report. - Prompts: Reusable prompt templates that the AI assistant can use for common workflows. Prompts are parameterized and can reference tools and resources. Example: a "cost review" prompt that guides the assistant through analyzing cost trends, identifying anomalies, and suggesting optimizations.
The protocol operates over two transport layers:
- stdio transport: The MCP server runs as a local subprocess, communicating with the AI assistant via standard input/output. This is the most common deployment for developer tools — the server runs on the engineer's machine alongside the AI assistant. It is secure by default (no network exposure) and fast (no network latency).
- HTTP transport (Streamable HTTP): The MCP server runs as a remote web service, communicating over HTTP with server-sent events for streaming. This enables shared servers that multiple users can connect to, cloud-hosted deployments, and integration with web-based AI assistants.
MCP sessions are stateful: the client connects to the server, performs a capability negotiation (the server declares which tools, resources, and prompts it offers), and then the client can invoke any declared capability throughout the session. The AI assistant's language model decides when and how to invoke tools based on the user's request and the tool descriptions — the human user does not need to know the specific tool names or parameters.
How MCP Works
An MCP interaction follows a well-defined lifecycle from connection to tool execution. Here is the complete flow:
1. Connection and Initialization:
// Client connects and sends initialize request
→ { "jsonrpc": "2.0", "method": "initialize", "params": {
"protocolVersion": "2025-03-26",
"capabilities": { "tools": {}, "resources": {} },
"clientInfo": { "name": "claude-code", "version": "1.0.0" }
}}
// Server responds with its capabilities
← { "jsonrpc": "2.0", "result": {
"protocolVersion": "2025-03-26",
"capabilities": {
"tools": { "listChanged": true },
"resources": { "listChanged": true }
},
"serverInfo": { "name": "costhawk-mcp", "version": "2.3.0" }
}}
// Client acknowledges initialization
→ { "jsonrpc": "2.0", "method": "notifications/initialized" }2. Tool Discovery: After initialization, the client requests the list of available tools. Each tool declaration includes a name, description, and JSON Schema for parameters:
→ { "jsonrpc": "2.0", "method": "tools/list" }
← { "jsonrpc": "2.0", "result": { "tools": [
{
"name": "costhawk_get_usage_summary",
"description": "Get a summary of AI API usage and costs for a date range.",
"inputSchema": {
"type": "object",
"properties": {
"start_date": { "type": "string", "format": "date" },
"end_date": { "type": "string", "format": "date" },
"group_by": { "type": "string", "enum": ["model", "provider", "key", "project"] }
},
"required": ["start_date", "end_date"]
}
},
// ... more tools
]}}3. Tool Invocation: When the AI model decides a tool is needed (based on the user's question and the tool's description), it generates a tool call that the client sends to the server:
→ { "jsonrpc": "2.0", "method": "tools/call", "params": {
"name": "costhawk_get_usage_summary",
"arguments": {
"start_date": "2026-03-09",
"end_date": "2026-03-16",
"group_by": "model"
}
}}
← { "jsonrpc": "2.0", "result": {
"content": [{
"type": "text",
"text": "Usage Summary (Mar 9-16, 2026):\n\nTotal Cost: $2,847.32\nTotal Requests: 142,891\nTotal Tokens: 891,204,553\n\nBy Model:\n gpt-4o: $1,523.40 (53.5%)\n claude-3.5-sonnet: $891.20 (31.3%)\n gpt-4o-mini: $312.50 (11.0%)\n gemini-2.0-flash: $120.22 (4.2%)"
}]
}}4. AI Processes and Responds: The AI assistant receives the tool result, incorporates it into its context, and generates a natural language response for the user. The user sees a coherent answer that seamlessly integrates live data — they may not even realize a tool call was made behind the scenes.
This entire cycle — discovery, invocation, result processing — happens in milliseconds for local MCP servers and typically under a second for remote servers. The user experience is conversational: they ask a question in natural language, and the AI assistant handles all the protocol mechanics transparently.
MCP and AI Cost Management
AI cost management is one of the most natural applications for MCP because cost data needs to be accessible in the context where cost decisions are made — the development environment. Here are the key use cases where MCP transforms cost management from a dashboard activity into an ambient capability:
Querying usage data: Engineers can ask natural language questions about their AI spend and get immediate answers without leaving their editor:
- "What did we spend on AI APIs this week?" →
costhawk_get_usage_summary - "Which model is costing us the most?" →
costhawk_get_usage_summarywithgroup_by: "model" - "How much did the search feature cost in February?" →
costhawk_get_usage_by_tag - "Show me a breakdown of spending by API key" →
costhawk_get_savings_breakdown
Running cost analyses: More sophisticated analyses that would require multiple dashboard interactions become single conversational requests:
- "Are there any cost anomalies in the last 24 hours?" →
costhawk_detect_anomalies - "What's our ROI on the prompt optimization we did last month?" →
costhawk_get_local_roi_report - "Compare our Claude Code usage across the engineering team" →
costhawk_list_claude_code_sessions - "How much could we save by routing simple queries to GPT-4o mini?" →
costhawk_get_savings
Managing budgets and alerts: Configuration tasks that require navigating settings pages become conversational:
- "Set up a webhook to alert me when daily spend exceeds $500" →
costhawk_create_webhook - "What alerts are currently configured?" →
costhawk_list_alerts - "Show me all active webhooks" →
costhawk_list_webhooks
Syncing usage data: Engineers can trigger data synchronization from within their assistant:
- "Sync my Claude Code usage data" →
costhawk_sync_claude_code_usage - "Sync Codex usage for the team" →
costhawk_sync_codex_usage
Contextual cost awareness: Perhaps the most powerful use case is proactive cost context during development. When an engineer is writing code that calls LLM APIs, their AI assistant can provide cost context unprompted: "Based on CostHawk data, this endpoint averages 2,400 tokens per request at $0.024. At your projected volume of 50,000 requests/day, that is $1,200/day. Consider using GPT-4o mini for the classification step to reduce this by 80%." This kind of contextual advice is only possible when cost data is accessible via MCP in the same environment where code is being written.
MCP vs Traditional APIs
MCP and traditional REST/GraphQL APIs both enable programmatic access to services, but they serve fundamentally different purposes and interaction patterns. Here is a detailed comparison:
| Dimension | MCP | Traditional REST/GraphQL API |
|---|---|---|
| Primary consumer | AI assistants (LLMs with tool-use capability) | Application code written by humans |
| Discovery | Dynamic. Client queries server for available tools at runtime. Tools can change between sessions. | Static. Developer reads API documentation, writes integration code, deploys it. |
| Invocation decision | AI model decides when to call tools based on natural language context and tool descriptions. | Developer writes explicit code that calls specific endpoints under specific conditions. |
| Parameter construction | AI model constructs parameters from natural language input using the JSON Schema. | Developer hard-codes parameters or maps them from application state. |
| Response handling | AI model interprets the response and generates a natural language summary for the user. | Developer writes parsing code to extract and display specific fields. |
| Authentication | Configured once during MCP server setup. The AI assistant handles auth transparently. | Developer implements auth (API keys, OAuth) in application code. |
| Statefulness | Session-based. Server can maintain context across multiple tool calls within a session. | Typically stateless (REST). Each request is independent. |
| Transport | JSON-RPC 2.0 over stdio (local) or HTTP (remote). | HTTP with REST conventions or GraphQL query language. |
| Best for | Interactive, exploratory use cases where a human asks questions and an AI fetches answers. | Programmatic, automated use cases where code makes predictable, repeatable API calls. |
MCP does not replace traditional APIs — it complements them. A platform like CostHawk offers both:
- REST API for programmatic integrations: CI/CD pipelines that check budget before deploying, dashboards that poll for latest data, automated scripts that pause keys when budgets are exceeded.
- MCP server for AI-assisted workflows: engineers querying cost data conversationally, AI assistants providing proactive cost advice during code review, on-call engineers investigating cost anomalies through their AI assistant.
The two interfaces often share the same underlying data layer but optimize for different interaction patterns. The REST API optimizes for precision and reliability (exact endpoints, typed responses, pagination). The MCP server optimizes for discoverability and natural language interaction (tool descriptions that help the AI choose the right tool, response formats that the AI can summarize for humans).
As AI assistants become more capable and more central to engineering workflows, MCP usage is growing rapidly. Many teams find that MCP becomes their primary interface for ad-hoc data queries, while the REST API handles automated, scheduled, and programmatic access.
CostHawk MCP Server
The CostHawk MCP server (costhawk-mcp-server on npm) provides comprehensive AI cost management capabilities to any MCP-compatible AI assistant. It is the fastest way to access CostHawk data without leaving your development environment.
Installation:
npm install -g costhawk-mcp-serverOr add to your AI assistant's MCP configuration (e.g., .mcp.json for Claude Code):
{
"mcpServers": {
"costhawk": {
"command": "npx",
"args": ["-y", "costhawk-mcp-server"],
"env": {
"COSTHAWK_API_KEY": "your-api-key"
}
}
}
}Available tools:
| Tool | Purpose | Key Parameters |
|---|---|---|
costhawk_get_usage_summary | Overall usage and cost summary | Date range, group_by (model/provider/key) |
costhawk_get_usage_by_tag | Usage broken down by custom tags | Date range, tag name |
costhawk_get_savings | Potential savings analysis | Date range |
costhawk_get_savings_breakdown | Detailed savings by optimization type | Date range |
costhawk_detect_anomalies | Detect cost anomalies | Lookback window, sensitivity |
costhawk_get_model_pricing | Current pricing for all models | Provider filter (optional) |
costhawk_list_alerts | View configured alert rules | Status filter |
costhawk_create_webhook | Create a new webhook endpoint | URL, events, filters |
costhawk_list_webhooks | View configured webhooks | Status filter |
costhawk_list_integrations | View connected integrations | None |
costhawk_list_subscriptions | View billing subscriptions | None |
costhawk_sync_claude_code_usage | Sync Claude Code usage data | Date range |
costhawk_sync_codex_usage | Sync OpenAI Codex usage data | Date range |
costhawk_list_claude_code_sessions | View Claude Code session history | Date range, user filter |
costhawk_list_codex_sessions | View Codex session history | Date range, user filter |
costhawk_get_local_claude_code_usage | Read local Claude Code usage from disk | None |
costhawk_get_local_codex_usage | Read local Codex usage from disk | None |
costhawk_get_local_roi_report | Generate ROI analysis from local data | Date range |
costhawk_get_proxy_guide | Instructions for setting up wrapped keys | Provider filter |
Example conversation with CostHawk MCP:
User: What's our AI spend looking like this week?
Assistant: [calls costhawk_get_usage_summary]
This week (Mar 9-16), your total AI API spend is $2,847.32
across 142,891 requests:
• gpt-4o: $1,523.40 (53.5%) — 48,200 requests
• claude-3.5-sonnet: $891.20 (31.3%) — 12,400 requests
• gpt-4o-mini: $312.50 (11.0%) — 72,100 requests
• gemini-2.0-flash: $120.22 (4.2%) — 10,191 requests
That's up 12% from last week ($2,541.80). The increase is
primarily driven by a 23% jump in claude-3.5-sonnet usage.
User: Any anomalies?
Assistant: [calls costhawk_detect_anomalies]
One anomaly detected: claude-3.5-sonnet usage spiked 3.2
standard deviations above baseline on March 14th, with 4,200
requests vs. the 7-day average of 1,770/day. The spike
correlates with API key "code-review-prod" — likely the new
automated code review pipeline that shipped that day.The Future of MCP
MCP is evolving rapidly. Since its initial release in late 2024, the protocol has gone through several revisions, with the March 2025 specification introducing streamable HTTP transport, improved authentication, and better error handling. Adoption is accelerating across the AI ecosystem:
Growing client support: As of early 2026, MCP is supported by Claude Code (Anthropic), Claude Desktop, Cursor, Windsurf, Cline, Continue, Zed, and a growing list of AI coding assistants. GitHub Copilot has announced MCP support in preview. This means building an MCP server provides reach across the majority of AI coding tool users without building separate integrations for each platform.
Enterprise features: The protocol is adding features critical for enterprise adoption:
- OAuth 2.0 authentication: Standardized auth flow for remote MCP servers, enabling multi-user deployments with per-user permissions. This replaces the current pattern of embedding API keys in local configuration.
- Elicitation: A mechanism for MCP servers to ask the user for additional information during tool execution, enabling interactive workflows like multi-step wizards or confirmation prompts for destructive actions.
- Audit logging: Standardized logging of tool invocations for compliance and security audit trails.
Beyond coding assistants: While MCP started in the developer tools space, the protocol is generic enough to work with any AI assistant. Customer support chatbots can use MCP to access CRM data. Sales assistants can use MCP to query pipeline metrics. Operations assistants can use MCP to check infrastructure status. As general-purpose AI assistants become more capable, MCP becomes the universal protocol for giving them access to enterprise data and tools.
Ecosystem growth: The number of available MCP servers is growing exponentially. There are now MCP servers for databases (Postgres, MongoDB), cloud providers (AWS, GCP), developer tools (GitHub, Jira, Linear), communication platforms (Slack, Discord), and monitoring tools (Datadog, Sentry, CostHawk). This network effect makes MCP increasingly valuable — each new server expands what AI assistants can do, driving more client adoption, which drives more server development.
Implications for cost management: As MCP matures, CostHawk's MCP server will evolve from a query interface into a full operational platform. Future capabilities include:
- Real-time cost streaming: MCP resources that stream live cost data, enabling AI assistants to monitor spend continuously and alert proactively
- Automated optimization: Tools that not only identify savings opportunities but execute them — re-routing models, adjusting max_tokens, enabling caching — with human approval via the elicitation mechanism
- Cross-tool orchestration: AI assistants that combine CostHawk MCP tools with GitHub MCP tools to correlate cost changes with specific deployments, automatically identifying which code change caused a cost increase
- Team-wide deployment: Remote MCP servers with OAuth authentication that give every engineer on a team access to cost data through their AI assistant, with role-based permissions controlling who can view vs. modify settings
MCP represents a fundamental shift in how humans interact with software tools. For AI cost management, this shift means cost awareness becomes ambient and continuous rather than episodic and dashboard-driven. CostHawk is committed to being at the forefront of this shift, ensuring that cost data is available wherever engineers work.
FAQ
Frequently Asked Questions
What AI assistants support MCP?+
Is MCP secure? How does authentication work?+
How is MCP different from function calling / tool use in LLMs?+
Can I build my own MCP server?+
@modelcontextprotocol/sdk), which provides a high-level API for creating MCP servers. A minimal MCP server requires defining tools (name, description, input schema, handler function) and starting a transport (stdio for local, HTTP for remote). Here is the conceptual structure: create a server instance, register tools with their schemas and handlers, and connect the transport. A simple tool like 'get current weather' can be implemented in under 50 lines of code. For cost management use cases, you might build custom MCP servers that expose internal metrics, cost allocation data, or team-specific budget information that is not available in CostHawk's standard server. The CostHawk MCP server is open source, so you can also extend it with custom tools specific to your organization's needs. The MCP ecosystem encourages composability — an AI assistant can connect to multiple MCP servers simultaneously, combining tools from CostHawk, your internal server, GitHub, and other providers in a single conversation.What is the latency of MCP tool calls?+
How does MCP handle errors and failures?+
isError: true result with a descriptive message explaining what went wrong and how to fix it. The AI assistant then communicates this to the user in natural language: 'I was unable to fetch your usage summary because your API key has expired. Please update your CostHawk API key in the MCP configuration.' For transient failures (network timeouts, temporary server issues), the AI assistant can automatically retry the tool call — most MCP clients implement retry logic with backoff. For permanent failures (invalid credentials, insufficient permissions), the assistant explains the issue and suggests remediation steps. CostHawk's MCP server also implements input validation using JSON Schema, catching malformed parameters before they reach the tool handler. This means common errors like invalid date formats or missing required fields are caught and reported with specific, actionable error messages.Can MCP tools modify data or only read it?+
costhawk_create_webhook. Most MCP clients implement this pattern — Claude Code, for instance, shows tool calls to the user and waits for approval before executing them. CostHawk's MCP server also supports a read-only mode for environments where write operations should be restricted, configurable via an environment variable at server startup. For enterprise deployments with OAuth authentication, the server can enforce per-user permissions so that only authorized users can execute write tools while everyone can use read tools.How does the CostHawk MCP server compare to using the CostHawk dashboard?+
Related Terms
LLM Observability
The practice of monitoring, tracing, and analyzing LLM-powered applications in production across every dimension that matters: token consumption, cost, latency, error rates, and output quality. LLM observability goes far beyond traditional APM by tracking AI-specific metrics that determine both the reliability and the economics of your AI features.
Read moreAgentic AI
AI systems that autonomously plan, reason, and execute multi-step tasks by chaining multiple LLM calls, tool invocations, and decision loops. Agentic workflows generate unpredictable and often enormous token consumption — 10x to 100x more than single-turn queries — making them the highest-cost AI pattern in production. Without per-session monitoring and cost guardrails, agent runs can consume hundreds of dollars in minutes.
Read moreDashboards
Visual interfaces for monitoring AI cost, usage, and performance metrics in real-time. The command center for AI cost management — dashboards aggregate token spend, model utilization, latency, and budget health into a single pane of glass.
Read moreWebhook
An HTTP callback that pushes real-time notifications when events occur — cost threshold breaches, anomaly detection alerts, usage milestones. Webhooks are the delivery mechanism that turns passive monitoring into active, automated response workflows across Slack, PagerDuty, Discord, and any HTTP endpoint.
Read moreAPI Gateway
A centralized entry point for API traffic that handles routing, authentication, rate limiting, and request transformation. For LLM APIs, gateways add cost tracking, policy enforcement, and provider abstraction.
Read moreWrapped Keys
Proxy API keys that route provider SDK traffic through a cost tracking layer. The original provider key never leaves the server, while the wrapped key provides per-key attribution, budget enforcement, and policy controls without requiring application code changes beyond a base URL swap.
Read moreAI Cost Glossary
Put this knowledge to work. Track your AI spend in one place.
CostHawk gives engineering teams real-time visibility into every token, every model, and every dollar across your AI stack.
