Claude + Exa search usage tracking: two bills, two meters

When you wire Exa MCP into Claude Code, you start paying twice for every research call. Exa charges per search and surfaces it on costDollars. Claude charges tokens for ingesting the result text and surfaces it on five_hour.utilization on the server. No single tool today shows both. The honest setup is two meters running in parallel.

M
Matthew Diakonov
9 min read

Direct answer (verified 2026-04-29)

There is no single tracker for “Claude + Exa search” usage. Read costDollars on every Exa MCP response for the per-search bill, and run ClaudeMeter for the Claude plan side, because Exa search results feed back as Claude input tokens that count against your rolling 5-hour window and weekly quota at claude.ai/settings/usage.

Why one tracker is not enough

Exa search via MCP is one of the most useful tools to wire into Claude Code. It is also the fastest way to burn through a Claude Pro or Max plan window without realizing it. The reason is straightforward: every search returns text, that text gets read into the next Claude turn as input tokens, and those tokens land on your server-side quota with the same weights as any other prompt. A two-paragraph follow-up question that triggers an Exa research call can chew up more of your 5-hour window than the previous fifteen minutes of code review.

The trackers people reach for cover only one edge each. Exa's dashboard and the Exa Cost Optimizer skill see Exa's per-call bill in dollars. ccusage reads local JSONL files under ~/.claude/projects and tots up token counts. The Claude /usage slash command shows API-side token spend. None of those see the weighted server quota that actually trips the rate limiter.

Bill 1: Exa's per-call cost lives in costDollars

Exa MCP responses include a costDollars object on every search response. It breaks the call into search type, content fetch, and any livecrawl charges. Sum these across a session for your Exa-side spend:

exa.search response (truncated)

ClaudeMeter does not read this. It has nothing to do with your MCP traffic. If you want a record of what each search cost, wrap your MCP client and log costDollars.total alongside the prompt that fired it.

Bill 2: Claude's plan-side cost lives in five_hour.utilization

The Anthropic rate limiter looks at one weighted utilization float per window in the response from GET /api/organizations/{org_uuid}/usage. Same endpoint claude.ai/settings/usage hits to draw the bar:

claude.ai/api/organizations/{org_uuid}/usage

One utilization float per window. The 429 fires when utilization >= 1.0. Every Exa search-result text chunk lands here as part of the next turn's input, weighted by peak-hour multiplier, model choice, and per-tool-call cost. ClaudeMeter polls this endpoint every 60 seconds via your existing browser session and renders both numbers in the macOS menu bar.

One Exa call, two ledgers

The same MCP search hits two billing systems at once. Most tools see one. Both numbers matter.

Exa MCP search through Claude Code

YouClaude CodeExa MCPclaude.ai serverfive_hour.utilizationClaudeMeterresearch-style promptexa.search(query)10 results + costDollars: $0.0250POST /completions (results in input)increment by weight(input, model, peak)GET /usage every 60sfive_hour.utilization = 0.86menu bar updates: 86% / 5h, $0.025 spent on Exa

What lands on each meter, every search

Exa search call, fanned out across both surfaces

Search query
Exa MCP
Result text
Claude turn
Peak-hour multiplier
two ledgers
Exa costDollars
Claude five_hour.utilization
ClaudeMeter menu bar

Reproduce both bills in one minute

You do not need anything new. Snapshot the Claude side with one curl, run the Exa-heavy turn, snapshot again, and read the delta. ClaudeMeter automates this in the menu bar; the manual version is fine for a sanity check:

Exa MCP call vs ClaudeMeter snapshot

One Exa research call: $0.025 to Exa, +0.08 utilization on Claude. The first number is the per-call money. The second is eight percentage points of your 5-hour budget gone, with another four hours of refactor still ahead. Two numbers, one decision.

What lands on which ledger, summarized

Five things to keep straight

  • Exa charges per search; the per-call cost is in costDollars.total on every MCP response. Sum these for your Exa bill.
  • Exa's result text becomes input_tokens on the next Claude turn; long research calls can move five_hour.utilization by 5-10 percentage points before Claude drafts a single line of code.
  • Anthropic's rate limiter only checks five_hour.utilization (and seven_day.utilization for the weekly cap); ccusage and local JSONL counts cannot see the weights.
  • ClaudeMeter polls /api/organizations/{uuid}/usage every 60 seconds and POSTs to localhost:63762; the menu bar reads the same JSON claude.ai/settings/usage renders.
  • Pro and Max plans share this rolling-window enforcement; metered extra usage (April 2026) lands as a separate dollar balance, also visible in ClaudeMeter.

A small wrapper that captures both

If you want the two numbers next to each other in code, drop a wrapper around your Exa MCP client. Read the Claude utilization before, run the search, read it after, return everything:

lib/tracked-exa-search.ts

The readClaudeUtilization helper is one fetch call. Authenticate with your local claude.ai cookies (the same way ClaudeMeter does), parse the JSON, return the two utilization fractions. Now every Exa search has a paired Claude-plan delta you can log, plot, or alert on.

The two-meter session, end to end

1

Pin both meters before the session

Open ClaudeMeter in the menu bar so the 5-hour and 7-day percentages are visible. Open Exa's dashboard (or your MCP wrapper log) so you can see costDollars per call. Two surfaces, one screen.

2

Note the starting utilization

Record five_hour.utilization at the moment you start. ClaudeMeter shows it; Anthropic's settings page shows the same number. You will need this to compute deltas after each Exa-heavy turn.

3

Run the Exa search call

Whatever your prompt is. The Exa MCP server fetches results, returns them into your conversation context, and the next assistant turn ingests every character as input tokens. Capture costDollars.total from the response if your wrapper exposes it.

4

Wait one poll cycle on the Claude side

ClaudeMeter polls every 60 seconds (configurable 30s to 5m). Wait until the next snapshot. The new utilization minus the old gives you the weighted server-side cost of the Exa turn.

5

Add the two bills

Exa: costDollars.total in dollars. Claude: utilization delta as a fraction of your 5-hour budget. They are not the same unit. Keep them separate. Report both. The combined view is the only honest one.

ClaudeMeter vs Exa-side and local-token tools

Different data source, different question. Use them together.

FeatureExa Cost Optimizer + ccusageClaudeMeter (plan side)
Tracks Exa per-search dollarsyes (Exa Cost Optimizer, Exa dashboard)no (out of scope)
Tracks Claude plan utilization (server-truth)noyes (five_hour and seven_day, every 60s)
Tracks local Claude Code tokens (~/.claude/projects)yes (ccusage, Claude-Code-Usage-Monitor)no (different data source)
Sees Exa-result text inflate 5-hour windowno (Exa-side tools see only their bill)yes (because it reads the server quota)
Sees peak-hour and model weightsno (local logs cannot see weights)yes (baked into utilization)
Predicts a 429 on the next promptno (different denominator)yes (utilization >= 1.0 trips the limiter)
Polling cadencevaries (per-call for Exa, on JSONL write for ccusage)60s default, 30s-5m configurable
Cookie paste requiredn/a (different data source)no (browser extension carries the session)

Numbers that matter

From the implementation. No invented benchmarks.

0sClaudeMeter poll cadence (configurable 30s-5m)
0field the Anthropic rate limiter actually checks
0ledgers per Exa search call
0cookies you have to paste

When to throttle Exa to save the Claude window

Once you can see both meters, the throttle decision becomes cheap. A research call typically moves the 5-hour bar by 4 to 10 percentage points, depending on result count and model. If ClaudeMeter shows you at 0% with three more hours of refactor ahead, an Exa call that pushes you to 0% is probably fine. The same call at 0% is not.

The point of two meters is being able to make that trade-off in real time, not at the moment Anthropic returns a 429.

The honest caveats

The Anthropic /api/organizations/{uuid}/usage endpoint is internal and undocumented. The shape has been stable for many months but Anthropic could rename or reshape it in any release. ClaudeMeter deserializes into a strict Rust struct and surfaces a parse error if the shape changes; a fix ships the same day. Exa's costDollars field is documented and stable, but Exa's pricing tiers do change, so the per-call numbers you see in old logs may not match current tariffs.

ClaudeMeter is macOS only (12+). Browser extensions for Chrome, Arc, Brave, and Edge. Safari is not supported yet. Linux and Windows are not on the roadmap. If you are not on macOS, the curl-and-jq path above still works.

Watch both meters while you research

ClaudeMeter sits in your macOS menu bar, refreshes every 60 seconds, and reads the same JSON claude.ai/settings/usage renders. Free, MIT licensed, no cookie paste. Pair it with Exa's costDollars for full coverage.

Install ClaudeMeter

Frequently asked questions

Is there a single tracker for Claude usage when I use Exa search?

Not as of April 2026. The honest answer is two meters. Exa returns a costDollars field on every MCP response (their per-search billing). Claude's plan-side cost is a weighted utilization fraction at five_hour.utilization on the Anthropic server, plus a separate seven_day fraction. ClaudeMeter polls the latter once a minute and renders it in the macOS menu bar. Exa's dashboard renders the former. If you want one number that covers both, you have to do the addition yourself.

Why do Exa search results show up on my Claude Pro/Max usage at all?

Because Claude does not search. Exa does. The MCP server fetches search results, returns the text into your Claude conversation, and the next assistant turn ingests every chunk of that text as input tokens. Anthropic weights those tokens into five_hour.utilization the same as any other prompt, including peak-hour multiplier, attachment cost, and per-model weight. A long Exa research query that returns 30 pages of result text can move the 5-hour bar by several percentage points before Claude even drafts a response.

Does ccusage count Exa search tokens?

Sort of, but in a way that does not predict your rate limit. ccusage reads ~/.claude/projects/<project>/<session>.jsonl and sums input_tokens + output_tokens per turn. The result text Exa MCP returns is part of input_tokens on the next assistant turn, so it shows up in the local count. But ccusage does not see the server-side weights (peak hour, model, attachments), so its percent and Anthropic's percent diverge by tens of percentage points during heavy research sessions. Treat ccusage as a token-flow gauge, not a quota gauge.

Where is Exa's per-search cost in the MCP response?

Every Exa MCP search response carries a costDollars field. It is a small JSON object that breaks down the cost of that one call (search type, contents fetched, livecrawl, etc.). Sum these across your session to get your Exa-side spend. This is the field you would feed into a per-project cost report. ClaudeMeter does not read Exa responses (it has nothing to do with your MCP traffic) so this is on you to capture if you want it.

Can I see how much one Exa research call burned on my Claude plan?

Yes, with one observation. Note ClaudeMeter's five_hour.utilization right before the Exa-heavy turn. Run the call. Wait one poll cycle (60 seconds). Note the new utilization. The delta is the cost of that turn against your 5-hour quota. The number is weighted, so it folds in peak-hour and model multipliers. Pair it with the Exa costDollars from the MCP response for the per-search money side. Two numbers, one prompt.

Does the Anthropic /usage CLI command cover this?

Different surface. The /usage command in Claude Code (and the Agent SDK cost docs) shows API-side token spend on a per-call basis, mostly aimed at developers paying via API credits. It does not show Pro/Max plan utilization. If you are on a Claude Max ($100-$200/month) plan running Claude Code in agentic loops, the rate limit you keep hitting is not API spend, it is five_hour.utilization on the server. /usage will not surface that field. ClaudeMeter does, by reading the same JSON claude.ai/settings/usage already renders.

Why not just disable Exa MCP when my window gets close to full?

That is the rational move. The point of seeing both meters is being able to make exactly that trade-off. With ClaudeMeter visible in the menu bar and Exa's costDollars logged from your MCP wrapper, you can see when the next Exa research call will move you from 78 to 92 percent on the 5-hour bar and decide whether the question is worth the cost. The whole point of the two-meter setup is to have the data the moment you need to choose.

Wiring Exa MCP into a heavy Claude Code loop?

If your two meters are diverging in a shape we have not seen, or you want a sanity check on a wrapper, send a snapshot. 15 minutes is plenty.