The rolling 5-hour wall in Claude Code is mostly your MCP servers, billed many times.
The 5-hour bar in claude.ai/settings/usageis one number. It blends your prompts, your tool outputs, your thinking tokens, and the system prompt that Claude Code auto-loads at session start. It also, on every cold turn, re-bills your full MCP tool definition payload at cache-write rate. With a heavy MCP setup and a normal day's worth of model switches, that overhead alone is a sizable fraction of the bucket. This page walks the math.
Yes. MCP tool definitions are roughly 30,000 to 100,000 tokens for a typical multi-server Claude Code setup. They bill at Anthropic's cache-write rate (1.25x base input) on every cold turn. A normal Claude Code 5-hour window has 8 to 15 cold turns (model switch, /compact, MCP toggle, 5-minute idle that exceeds the default cache TTL). The product of those two is a real fraction of the five_hour bucket. The Settings page bar does not split it out, and ccusage cannot apply the 1.25x server reweighting. Component sizes are from Anthropic's own simulator at code.claude.com/docs/en/context-window; cache pricing is documented at docs.anthropic.com prompt caching.
Step 1. The cold-turn arithmetic
You can derive the tax with pencil and paper. The component sizes come from Anthropic. The cache multipliers come from Anthropic. The one variable is how many cold turns you actually have in a 5-hour stretch, and that is largely about how often you switch models and how long you pause between prompts.
Step 2. The bucket the math fills
When the math above charges five_hour, it is charging this struct. The field that decides whether the next prompt returns content or 429s is the first one. Source: github.com/m13v/claude-meter.
Two fields are worth noticing. The Option<Window> shape with utilization: f64 and resets_at is the entire signal: a fraction between 0 and 1, and an ISO timestamp for when it ages back below the wall. The other useful one is seven_day_oauth_apps, the OAuth-only weekly bucket that Claude Code + MCP traffic charges (browser chat does not). MCP overhead lands on both buckets at once.
Step 3. A timeline of one 5-hour window
A concrete session, told in cache-state moments. Every line below is a place where the prefix cache flipped from warm to cold or back again. The 5-hour bar in /settings/usage aggregates all of them.
T+0:00 Cold start. The full payload bills at 1.25x.
First message of the rolling 5-hour window. Claude Code sends system prompt + auto memory + env + MCP tool definitions + global CLAUDE.md + project CLAUDE.md + your prompt. With a 5-server MCP setup that is ~75,000 input tokens, billed at the cache-write rate (1.25x base input). The five_hour float ticks up by roughly 93,750 input-equivalent tokens of overhead alone, before any of the actual prompt content.
T+0:30 Warm turns. Cache reads at 0.10x.
You send 6 follow-up prompts in 5 minutes. The cached prefix (system prompt, tool defs, CLAUDE.md) reads at 10% of input rate. New bytes (your incremental prompts, file reads) bill at full rate. Cheap. The five_hour slope is shallow.
T+0:35 You switch Sonnet -> Opus 4.7 for the hard part. Cold turn.
Model switch invalidates the prompt cache. The next turn re-sends the same ~75,000-token payload and re-bills it at 1.25x. five_hour ticks up by another ~93,750 input-equivalent tokens of overhead. ccusage faithfully records 75,000 cache_creation tokens. Anthropic's float records the weighted server cost.
T+1:10 Window pressure. You run /compact.
/compact rewrites the conversation segment to fit, which invalidates the cache. Next turn is cold. Another 93,750 input-equivalent tokens of overhead charged.
T+1:45 You step away for lunch. TTL expires.
Default cache TTL is 5 minutes. After a 30-minute break you return. Next turn is cold. Another cold-turn re-bill. (The extended-cache flag pushes TTL out, which is exactly the kind of move that flattens the five_hour slope.)
T+2:30 You add a new MCP server to test it. Cold turn.
Any change to the tool index forces a cache rewrite. Cold turn. ~93,750 input-equivalent tokens of overhead, again.
T+4:00 five_hour at 74%. The bucket math is mostly overhead.
Five cold turns in ~4 hours, each re-billing ~93,750 input-equivalent tokens of overhead, equals ~468,750 input-equivalent tokens of pure tool-definition + system-prompt weight against the 5-hour bucket. Your actual prompt content and the model's thinking tokens are on top of that. The Settings bar reads 74% and you have not done anything that feels expensive.
T+4:42 five_hour hits 1.0. Next prompt 429s.
ClaudeMeter showed 92%, 95%, 99% over the last 8 minutes as warm turns continued to add small slices. You either wait 5 hours for resets_at, drop to claude.ai web chat (does not touch seven_day_oauth_apps), or enable extra usage. None of that lets you see, on the Settings page, that the wall was mostly your MCP servers.
Step 4. The poll that watches the bucket
The 5-hour wall is decided by a single float on a single endpoint. The Settings page reads it. ClaudeMeter reads it. The same fetch shape works from a tampered curl in a terminal. There is nothing undocumented about the request itself, only about the response schema, which Anthropic can change. The Rust struct above wraps every field as Option so a renamed field degrades to a single missing row instead of a crashed app.
Step 5. What that looks like live
One claude-meter status one-shot, with five_hour climbing well past where the other clocks have for the same week of work. This is the shape of an MCP-heavy session approaching the rolling-window wall: the 5-hour bar is the loud one, and the OAuth-apps weekly bucket is high enough to bite next.
Step 6. ClaudeMeter and ccusage answer different questions
Both are correct. They read different sources and they reweight differently, on purpose.
| Feature | ccusage (local JSONL sum) | ClaudeMeter (server-truth /usage poll) |
|---|---|---|
| What it measures | Local-disk sum of input + output + cache tokens written to ~/.claude/projects/*.jsonl. | Server-side rolling 5-hour utilization, post-reweighting. |
| Sees MCP overhead? | Yes, as cache_creation_input_tokens per turn. Per-server attribution requires reading the JSONL. | Yes, lumped into five_hour. Cannot isolate the MCP fraction from prompts and thinking. |
| Sees the 1.25x cache-write reweighting? | No. JSONL records raw token counts before the rate-limit math runs. | Yes implicitly: five_hour climbs by the server-weighted amount. |
| Sees browser-chat usage that fills the same bucket? | No. ccusage reads Claude Code's local logs only. | Yes. claude.ai chat hits five_hour too. |
| Updates without interrupting Claude Code | Yes if you run it in a tmux pane. /usage inside Claude Code interrupts the loop. | Yes. Polled out-of-process by the menu bar app every 60 seconds. |
| Cost | Free, MIT, cross-platform. | Free, MIT, no telemetry. macOS only. |
Step 7. Three moves that flatten the slope
None of these change the math; they reduce the inputs. After each one, the five_hour slope visibly flattens over the next few hours if it was the wall.
- Stop manual model switches mid-session. Every Sonnet -> Opus or Opus -> Sonnet flip is a guaranteed cold turn that re-bills your full MCP overhead at 1.25x. If you already know the next 6 messages need Opus, switch once and stay.
- Prune MCP servers you are not using this week. A Jira-class server is roughly 17,000 tokens by itself; removing it shrinks every cold-turn re-bill by 17,000 tokens at 1.25x (~21,250 input-equivalent). Use
claude mcp listto audit, remove anything you have not invoked in a week. - Leave
ENABLE_TOOL_SEARCHon auto. Default behavior loads MCP schemas only when the eager set would exceed 10% of the window. Forcing it tofalsere-bills the worst case on every cold turn.
Watching the 5-hour bar climb and not sure if MCP is the cause?
A 20-minute call to walk through your Claude Code setup and see which lever flattens the slope first.
FAQ
Frequently asked questions
Does loading MCP servers actually make me hit the Claude Code rolling 5-hour wall sooner?
Yes, but only on cache-miss turns. Anthropic's prompt cache reads cached prefixes at 10% of the input rate, so a warm turn pays almost nothing for the system prompt and MCP tool definitions. The trap is the cold turn: model switch (Sonnet to Opus), /compact, MCP server toggle, or just a 5-minute idle gap (the default cache TTL) invalidates the cache. The next turn re-sends every byte of overhead and bills it at 1.25x input (the cache-write rate). 30K to 100K tokens of MCP definitions, billed several times per 5-hour window, is a real fraction of the five_hour bucket that ccusage cannot show you because the JSONL log lines are written before the server-side reweighting lands.
How many cache misses does a typical Claude Code 5-hour window have?
Anecdotal but consistent across heavy users: 8 to 15. Model switches alone cause one each: a refactor that starts on Sonnet, escalates to Opus 4.7 for the hard part, drops back to Sonnet for cleanup, is three cold turns. /compact is another, common in long sessions where the window pressure forces a flush. Toggling an MCP server (claude mcp add or remove) invalidates the tool-index region. And the default cache TTL is 5 minutes, so any pause longer than that, lunch, a Slack thread, a meeting, returns you to a cold prefix. Multiply 30K of MCP overhead by 10 cold turns and you have charged the rolling 5-hour bucket for 300K input-equivalent tokens worth of overhead before counting a single one of your prompts.
Does the 5-hour bar in claude.ai/settings/usage show the MCP fraction separately?
No. The Settings page renders a single bar for the five_hour field, period. Inside that float, MCP overhead, system prompt, file reads, your prompts, tool outputs, and thinking tokens are all blended into one utilization percentage. The endpoint returns no per-component breakdown. /context (inside Claude Code) shows the live breakdown of the 200K context window for the current turn only, but that is a different surface than the 5-hour bucket and it has nothing to say about the past 4 hours and 59 minutes that the bucket actually remembers. So you cannot directly read 'MCP is X percent of my 5-hour fill.' What you can do is watch the bar climb before and after pruning MCP servers and see the slope flatten.
What is the cheapest way to confirm MCP overhead is what is killing my 5-hour bucket?
Two-step: run /context inside Claude Code right now and note the tool-definition line; then poll GET https://claude.ai/api/organizations/{org_uuid}/usage and note five_hour.utilization. Run /context again 10 cold turns later (cause cold turns deliberately by switching model). The MCP tool definitions line will be the same; five_hour.utilization will have moved more than the ccusage local sum would predict. The delta minus the prompt-byte delta is roughly your MCP-overhead-times-cache-miss tax. Or skip the manual loop and watch ClaudeMeter, which polls /usage every minute and pins five_hour to the menu bar.
Does ENABLE_TOOL_SEARCH=auto solve the MCP overhead problem?
Mostly yes for context window, partly yes for the 5-hour bucket. With auto, Anthropic loads MCP schemas only when the eager set would exceed 10% of the context window (about 20,000 tokens). So a one-or-two-server setup gets eager loading and a five-or-more-server setup falls back to deferred (names only, ~120 tokens, schemas fetched on demand). For window pressure that is huge. For the 5-hour bucket it is still real but smaller, because when the model actually invokes a tool whose schema was deferred, that turn fetches and bills the schema, and that turn is by definition a cold turn for that schema. The net effect is fewer schema bytes flowing per session than the worst case, but the bytes that flow still re-bill on cache miss.
How is this different from the seven_day_oauth_apps bucket?
Different clock, same underlying tokens. seven_day_oauth_apps is the rolling 168-hour bucket that only fills from OAuth-authenticated traffic (Claude Code + MCP, not claude.ai browser chat). five_hour is the 5-hour bucket that fills from every client on the account. MCP overhead lands in both: it charges five_hour because it is part of every Claude Code turn, and it charges seven_day_oauth_apps because the Claude Code turn is OAuth-authenticated. So MCP-heavy users see both buckets climb. The 5-hour wall bites first inside one session, the OAuth weekly bucket bites first across a full week of work.
Will the May 6 2026 doubled 5-hour limit make MCP overhead a non-issue?
It buys headroom, not invisibility. Anthropic doubled the rolling 5-hour limit on Pro, Max, Team, and seat-based Enterprise plans on May 6, 2026. So the same MCP-overhead-times-cache-miss math now fills a 2x larger bucket; you hit the wall later, but at the same fill rate. And the weekly caps were not doubled, so the OAuth weekly bucket still climbs at the original rate. If you noticed the 5-hour wall less in mid-May and started noticing seven_day_oauth_apps walls earlier in the week, this is the reason.
Where does ClaudeMeter watch the 5-hour bucket from?
From the same endpoint claude.ai/settings/usage renders. The Rust struct in src/models.rs at /Users/matthewdi/claude-meter declares UsageResponse with seven Window fields plus extra_usage; five_hour is the first one. The browser extension's extension/background.js polls https://claude.ai/api/organizations/{org_uuid}/usage every 60 seconds with credentials: 'include' (your existing claude.ai session cookie), POSTs the JSON to the menu-bar app on localhost:63762, and the SwiftUI popover renders five_hour.utilization plus its resets_at as a relative duration. No cookie paste, no telemetry, single HTTPS request per minute to claude.ai. Source: github.com/m13v/claude-meter, MIT license.
Why can ccusage not see this overhead-times-cache-miss tax?
ccusage walks ~/.claude/projects/*.jsonl and sums input_tokens, output_tokens, cache_creation_input_tokens, and cache_read_input_tokens that Claude Code wrote during streaming. That is a faithful local-token measurement. It does not include the server-side tokenizer expansion, the peak-hour multiplier, or full thinking-token spend, all of which land on the five_hour float on the server. So on a session with several cache-miss turns and an MCP-heavy setup, ccusage's percentage typically lags the server's by 15 to 30 points. They are both right at what they measure; the server's number is the one that decides your next 429.
If I want to keep my MCP servers but not pay the 5-hour tax, what helps?
Three concrete moves. First, set the extended cache TTL flag so the cache stays warm across 5-minute idle gaps; this turns a lot of would-be cold turns warm. Second, stop manually switching models mid-session unless the next turn genuinely needs the bigger model; every switch is a guaranteed cold turn that pays the full MCP overhead. Third, drop MCP servers you do not actively use this week; each pruned heavy server (Jira-class is ~17,000 tokens by itself) shrinks the per-cold-turn re-bill. ClaudeMeter does not change any of these for you; it lets you see whether each move actually flattens the five_hour slope.
The other facets of the same enforcement surface.
Adjacent guides
Claude Code context overhead: ~7,850 tokens before you type
The per-component breakdown from Anthropic's own simulator, what eager MCPs push it to, and why the same overhead lands on both the 200K window and the weekly quota.
Claude Code rolling 5h + weekly quota: it is four clocks, not two
five_hour, seven_day, seven_day_opus, seven_day_oauth_apps. Any one at 1.0 fires the next 429. The Settings page bar shows only one of them.
Claude Code rolling 5-hour usage: three ledgers, three answers
/usage is a snapshot. ccusage reads local JSONL. The float that 429s your loop is on the server and counts browser chat too. Which tool reads which.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.