Claude Code context overhead is ~7,850 tokens before you type. The part that surprises people: it bills against your weekly quota too.
Anthropic publishes the per-component sizes in its own interactive simulator. Sum them and a stock Claude Code session starts at 7,850 tokens, well before your first prompt. With eager-loaded MCPs, real setups land at 30,000 to 100,000 tokens. That overhead lives on the 200,000-token context window. It also, separately, gets billed against the seven-day quota meter on every cache-miss turn. The two meters are easy to confuse and ccusage only sees one of them.
Stock Claude Code overhead is approximately 7,850 tokens before the first user prompt: 4,200 system prompt + 680 auto memory + 280 environment info + 120 MCP tool index + 450 skill descriptions + 320 global ~/.claude/CLAUDE.md + 1,800 project CLAUDE.md. Numbers per Anthropic's simulator at code.claude.com/docs/en/context-window. Eager-loaded MCP servers push overhead into the 30K to 100K range; a five-server setup typically lands near 55,000 tokens of tool definitions alone. The same overhead bytes are also counted by Anthropic's seven-day quota meter, at full cache-write rate on every cache-miss turn (model switch, /compact, MCP toggle, TTL expiry).
What loads before your first keystroke
Anthropic ships an interactive simulator on its own docs site that enumerates every auto-loaded component in a Claude Code session, with a token count attached to each. The numbers below are pulled straight from the EVENTS array on that page. Treat them as Anthropic-authoritative for a stock setup; the only thing you can do to make them smaller is shrink your CLAUDE.md files.
Two of these line items are worth a second look. First, the system prompt at 4,200 tokens is fixed by Anthropic; you cannot trim it. It carries the identity instructions, the tool-use rules, the response-formatting rules, and the safety scaffolding. Second, the MCP tool index at 120 tokens is the deferred-loading default: only tool names ship up front, schemas load on demand when a tool is actually needed. That 120-token line is the lazy path. Force the eager path and that one number changes a lot.
What pushes 7,850 to 100,000+: eager MCP loads
Anthropic auto-enables MCP Tool Search when your eager-loaded tool definitions would consume more than 10 percent of the context window (above 20,000 tokens). That guardrail keeps the worst case off most clean installs. But three cases bypass it: setting ENABLE_TOOL_SEARCH=false, using a Claude Code release that predates the feature, or running a single MCP server that ships its full schema set as a non-search payload. Stack a few of those and you get the numbers below.
A user running Jira plus three other heavy MCPs with eager loading easily lands at 100,000 tokens of overhead before the conversation starts. That is half the 200,000-token window, gone. It is also, separately, 100,000 input tokens billed against the seven-day quota on the next cache-miss turn. The two costs do not feel like the same problem until you look at the meter.
Two meters, one input
The 200K context window and the seven-day quota are not the same constraint. They look at the same overhead bytes from two different angles, and they fail in two different ways. The window is a hard ceiling per turn; the quota is a rolling utilization fraction across all turns from all devices on this account in the last seven days. Hitting the first triggers /compact or transcript truncation. Hitting the second returns a 429 with no client-side warning.
Anthropic does not expose per-component breakdowns on the quota side. The response shape is plain:
Each of those Window structs is exactly two fields: utilization (a float between 0 and 1+) and resets_at (when the oldest chargeable traffic in the bucket ages out). Whatever overhead just landed in your last turn shows up as a delta on seven_day_opus.utilization, with no telemetry pointing back at "system prompt" or "tool definitions" or "file read." That is why context-overhead choices feel disconnected from quota outcomes; the server collapses them into one float.
The 200K window vs the seven-day quota, side by side
| Feature | Seven-day quota | 200K context window |
|---|---|---|
| What it bounds | Rolling-window utilization across all turns and all devices on this account. | Total tokens loaded into a single turn (200,000 hard ceiling). |
| Where you read it | claude.ai/settings/usage (or its source endpoint /api/organizations/{org}/usage). Polled by ClaudeMeter every 60 seconds. | /context inside Claude Code, plus context_window.used_percentage on the status line. |
| How it fails | Returns 429 from the API path that Claude Code hits, with no client-side warning before the wall. | Triggers /compact or transcript truncation; never returns an error. |
| How overhead lands on it | Once at cache-write rate on cold turns; once at one-tenth rate on warm turns. Multiple cold turns per session means the same overhead is billed several times in a week. | Once per turn, regardless of cache state. 7,850 tokens of stock load is 7,850 tokens of window pressure on every turn. |
| What ClaudeMeter shows | Live worst-bucket utilization plus the soonest resets_at, color-coded green/amber/red, in the macOS menu bar. | Nothing. /context already covers it. |
| What ccusage shows | Local-token sum, useful as a flow measurement; structurally cannot see server-side weighting or browser-chat traffic. | Indirectly: token sums per JSONL log let you compute per-turn input size. |
The cache-miss tax: why your overhead bills more than once
Anthropic's prompt cache is what makes the per-turn overhead tolerable. Cache reads bill at 10 percent of the input rate; cache writes bill at 125 percent of plain input. So the first cold turn of a session pays full price on the overhead, the next handful of warm turns pay a tenth of it, and anything that invalidates the cache restarts the cycle. The five things that invalidate it are worth memorizing.
Cold turn (first prompt or after a cache miss).
Claude Code sends the full input: system prompt + auto memory + env + MCP tool index + skill descriptions + global CLAUDE.md + project CLAUDE.md + transcript so far + your prompt. All of it bills at the cache-write rate, which is 25 percent more expensive than plain input. On a stock setup that is 7,850 tokens of overhead; with eager MCPs, 30K-100K. The seven_day_opus float ticks up by the full amount.
Warm turns (cache TTL, default 5 minutes).
The system prompt prefix and tool definitions are read from cache at 10 percent of the input rate. Your incremental prompt and any new file reads bill at full rate. A run of 4-6 follow-up prompts inside the cache window costs an order of magnitude less than the cold turn that started it. ccusage and the seven_day_opus fraction agree closely here.
Cache miss (model switch, /compact, MCP toggle, TTL expiry).
Switch from Sonnet to Opus mid-session, run /compact to free window space, toggle an MCP server, or simply pause for more than 5 minutes (or whatever you set the TTL to). The next turn is cold again. The full overhead re-bills at cache-write rate. Three cache misses in an afternoon equals three full re-bills of whatever your overhead currently is.
Where the meters disagree.
ccusage faithfully sums what your Claude Code wrote to JSONL, which includes the cache-creation and cache-read tokens but not the server-side tokenizer expansion or the full thinking-token spend. seven_day_opus, deserialized by claude-meter into a Window with utilization (f64) and resets_at, reflects what the server actually counted. After several cache-miss turns in a session, the gap between the two becomes visible: ccusage may say you spent 12 percent of plan today, the menu bar reads 18 percent.
Where to spend the savings.
Trim project CLAUDE.md, leave ENABLE_TOOL_SEARCH on auto, prune unused MCP servers. Each of those reduces the overhead that gets re-billed on every cache-miss turn. The 200K window comes back too, but the bigger win is the seven-day quota fraction moving slower over a week. Watch the menu bar and the trade-off becomes visible.
Read your own breakdown with /context
The numbers above are Anthropic's baseline. Yours will differ based on your CLAUDE.md size, MCP load, and how much of the conversation has accumulated. Inside an active Claude Code session, type /context. The output is a per-component breakdown of the live window with token counts and percentages.
Run /context twice ten minutes apart and the deltas tell you which file reads and tool outputs are doing the work. If the first three lines (the stock overhead) look much larger than the simulator numbers above, that is your CLAUDE.md or your MCP load, and that is where the easy savings live.
How a single Claude Code turn flows through both meters
One input, two meters, two failure modes
Auto-loaded at session start
System prompt, auto memory, env, MCP tool index, skill descriptions, global CLAUDE.md, project CLAUDE.md. ~7,850 tokens stock; 30K-100K with eager MCPs.
Sent on every turn
Full transcript so far + auto-loaded overhead + your new prompt. Subject to Anthropic's prompt cache (TTL 5 minutes by default; longer with the extended-cache flag).
Meter 1: 200K context window
Hard ceiling on a single turn. Fails by triggering /compact or transcript truncation. Visible via /context inside the CLI.
Meter 2: seven-day quota
Rolling utilization fraction at /api/organizations/{org}/usage on claude.ai. Counts overhead bytes again on every cache-miss turn at cache-write rate. Fails by 429. ClaudeMeter polls it once a minute.
The quota side is what ClaudeMeter shows
Anthropic ships /context for the window side. It does not ship anything for the quota side in the CLI. To see whether your last cold turn actually moved seven_day_opus.utilization from 0.62 to 0.71, you have to either curl the endpoint by hand or run something that polls it for you. ClaudeMeter polls the same /api/organizations/{org}/usage endpoint claude.ai/settings/usage renders, every 60 seconds, through your existing browser session. The macOS menu bar shows the worst bucket (five_hour or seven_day_opus, whichever is higher) plus the soonest resets_at.
That makes the cache-miss tax visible in real time. Toggle an MCP server, watch the next turn land, watch the bar tick. Run /compact mid-session, watch the bar tick again. Trim CLAUDE.md, watch the bar tick more slowly over the next afternoon. It is the closed loop between context-overhead choices and quota-meter outcomes that /context alone cannot show you because it does not look at the server.
One install (brew install --cask m13v/tap/claude-meter) plus the browser extension, no cookie paste, MIT licensed, no telemetry, single HTTPS request per minute to claude.ai using your own session. The extension picks up the session that is already in your browser; the menu-bar app listens on 127.0.0.1:63762 for snapshots.
Your Claude Code overhead looks fine in /context but you keep hitting the weekly wall. Want a 15-minute walkthrough?
Bring a session log and a screenshot of your menu bar. We'll trace which cache-miss pattern is moving your seven_day_opus fraction faster than ccusage suggests.
Frequently asked questions
How big is Claude Code's context overhead before I type anything?
On a stock Claude Code session, ~7,850 tokens. The number comes from Anthropic's own interactive simulator at code.claude.com/docs/en/context-window, where the EVENTS array lists each auto-loaded component with its token count: System prompt 4,200, MEMORY.md 680, environment info 280, MCP tool index (deferred names only) 120, skill descriptions 450, ~/.claude/CLAUDE.md 320, project CLAUDE.md 1,800. That is roughly 4 percent of the 200,000-token context window. Eager-loaded MCP servers stack on top of this and routinely push real-world overhead into the 30,000 to 100,000 range.
Where does that overhead come from? I never asked for it.
Seven things load before your first prompt: the system prompt (Claude Code's identity, tool-use rules, formatting rules), MEMORY.md (notes Claude wrote to itself in past sessions), environment info (working directory, OS, shell, git branch and recent commits), an index of MCP tool names so the model knows what is available, one-line skill descriptions, your global ~/.claude/CLAUDE.md, and the project's CLAUDE.md. By default MCP tool schemas stay deferred (names only, ~120 tokens). Setting ENABLE_TOOL_SEARCH=auto loads schemas if they fit in 10 percent of the window; ENABLE_TOOL_SEARCH=false loads them all upfront, regardless of size.
What pushes the overhead from 7,850 to 30K or 100K?
MCP servers loaded eagerly. A five-server setup (filesystem, GitHub, search, postgres, slack) typically lands around 55,000 tokens of tool definitions. A heavier server like Jira ships ~17,000 tokens by itself. Stack three or four heavy servers and the eager load reaches 100,000+. Anthropic now auto-enables MCP Tool Search whenever your eager-loaded tools would exceed 10 percent of the window (so above 20,000 tokens), which is why a clean install rarely shows the worst case anymore. But if you forced ENABLE_TOOL_SEARCH=false, or you are on an older Claude Code release, you pay it all upfront.
Does the 200K context window have anything to do with my weekly quota?
They are two different meters that share an input. The 200K context window is a hard ceiling on what a single turn can carry; if your transcript exceeds it Claude Code compacts or truncates. The seven-day quota is a rolling utilization fraction that Anthropic computes server-side from billed input and output tokens, including everything in the system prompt, the tool definitions, the auto memory, the file reads, and the conversation history. Both meters look at the same overhead. They count it differently and they fail differently. Hitting the 200K wall produces a /compact, hitting the seven-day wall produces a 429.
If overhead lives in the system prompt and Claude caches that, why does it cost anything?
It is much cheaper, not free, and only when cache hits. Anthropic's prompt cache reads at 10 percent of the input rate; cache writes are 25 percent more expensive than uncached input. So an idle cache-hit turn bills the overhead at one-tenth of full rate. The trap is cache misses. The cache invalidates on a model switch, on /compact, on a tool toggle, on certain MCP reconnections, and on cache TTL (5 minutes by default unless you set the longer TTL). After any of those, the next turn re-bills the full overhead at the cache-write rate. Several of those misses in an hour and your seven-day quota fraction moves visibly faster than ccusage's local sum predicts.
Where does ccusage see all of this?
ccusage walks ~/.claude/projects/**/*.jsonl and sums the input, output, cache_creation, and cache_read counts your Claude Code CLI wrote during streaming. It is correct as a token-flow measurement on this device. It cannot see browser-chat traffic on claude.ai. It cannot see traffic from another device on the same account. And it does not always reflect the server's billing weights cleanly: the seven_day_opus utilization fraction includes thinking tokens that are not always written to JSONL in full, plus tokenizer expansion that runs server-side after Claude Code wrote the line. So ccusage reads what was sent. It does not read what the rate limiter is actually counting.
How do I see where the overhead is going right now?
Inside an active Claude Code session, type /context. It prints a breakdown of every component currently in the window, with token counts and percentages: system prompt, tools, memory, skills, conversation history, and free space. That is the live source of truth for context utilization. It is a one-shot snapshot, not a continuous meter, but it is exactly the breakdown that drives the simulator numbers above. Run it twice, ten minutes apart in the same session, and the deltas show you what file reads and tool outputs are doing to your window.
Does /context show the seven-day quota too?
No. /context reads internal CLI state about the local context window. The seven-day quota lives at GET https://claude.ai/api/organizations/{org}/usage on the claude.ai host (the same endpoint claude.ai/settings/usage renders behind the bar charts). The response shape, deserialized by ClaudeMeter's Rust UsageResponse struct in src/models.rs lines 18-28, has seven Window-shaped fields, each carrying just utilization (f64) and resets_at (Option<DateTime<Utc>>). Nothing in /context touches that endpoint, and nothing in /api/organizations/{org}/usage exposes per-component context overhead. They are different surfaces with different cookies.
How do I actually reduce the overhead?
The cheap wins, in priority order: keep project CLAUDE.md under 200 lines (the simulator caps it at 1,800 tokens, but a 5,000-line one will load every relevant byte and dominate the budget), let ENABLE_TOOL_SEARCH stay on auto so MCP schemas load lazily, prune MCP servers you don't actually use this week (each removed Jira-class server is roughly 17,000 tokens back), move per-directory rules into .claude/rules/ with paths: matchers so they only load when Claude reads a matching file, and run /compact early when you notice the window past 50 percent rather than waiting for the 200K wall. The single biggest free win is auditing project CLAUDE.md.
Will trimming overhead actually save quota, or just window space?
Both, and the quota saving is bigger than people realize. Every cache-miss turn pays for whatever overhead is loaded right now, billed at the cache-write rate (25 percent above plain input). A 50,000-token MCP definition that you never invoke this session costs you roughly 50,000 input tokens on every cold turn, even though you never used a tool from that server. Trim it to deferred or remove the server, and those tokens stop landing in your seven-day bucket. Window space comes back too, but the quota effect is what makes the seven_day_opus fraction move slower over a week.
Where does ClaudeMeter fit in this picture?
ClaudeMeter watches the second meter, not the first. The /context command and the status line cover the local 200K window well. ClaudeMeter polls /api/organizations/{org}/usage once a minute through your existing claude.ai browser session and shows the live five_hour and seven_day_opus utilization in the macOS menu bar. When you trim project CLAUDE.md or disable an unused MCP, you can watch the seven-day fraction climb slower over the next few hours instead of guessing. It is the single signal that closes the loop between context-overhead choices and quota-meter behavior.
More on the gap between local and server
Why ccusage's Opus tokens never match the server fraction
Two ledgers. ccusage reads ~/.claude/projects/*.jsonl. seven_day_opus reads what the rate limiter is actually counting. Here is the field map and why the gap is structural, not a bug.
Hung Claude Code: which signal tells you it's a rate-limit
The status line, /status, and the spinner all read local state. Only /api/organizations/{org}/usage can confirm a server-side wall. How to read each signal in under a minute.
Rolling 5-hour window, not a calendar window
Why utilization drifts minute to minute even when you stop sending messages. The bucket ages out continuously; a snapshot from 30 minutes ago is usually wrong.