Context compression vs plan quota: two different walls people keep mixing up
Context compression is the auto-compact mechanic in Claude Code that summarizes your conversation when it gets near the 200K-token window. Plan quota is the rolling 5-hour and 7-day budgets Anthropic enforces against your whole account. They are independent. The thing every guide on this misses is that compacting does not save plan quota; it costs more of it, because the summarization itself is a billed prompt against the same week.
Direct answer (verified 2026-05-08)
No, they are not the same. Context compression (auto-compact, /compact) summarizes earlier conversation turns when this single chat approaches the 200K-token context window of the model. Plan quota is the rolling 5-hour and 7-day server buckets Anthropic enforces against your Anthropic account, account-wide, across every device. Compacting does not refundplan quota. Per Anthropic's help center: “Longer conversations that trigger automatic context management consume more of your usage limit.” Source: support.claude.com / how do usage and length limits work.
Two systems, side by side
The names sound similar and both feel like “Claude is running out of room,” but they live in different places and respond to different commands.
| Feature | Context compression (200K window) | Plan quota (rolling 5h / 7d buckets) |
|---|---|---|
| What it caps | The total tokens in this single conversation, against the 200K-token context window of the model. | The total compute on your whole Anthropic account, weighted, in two sliding bands (5 hours and 7 days). |
| Where it lives | Inside the model server's prompt-construction step. It is local to the conversation, scoped to one chat session. | On a separate server-side enforcement layer that 429s your account, scoped to your org_uuid. Account-wide, every device, every browser tab, every Claude Code instance shares it. |
| How you see it | /usage inside Claude Code (a one-shot dump). The “Context left until auto-compact” line in the Claude Code header. The auto-compact warning at ~95%. | claude.ai/settings/usage in a browser, /usage inside Claude Code, or a tool that reads /api/organizations/{org_uuid}/usage continuously (claude-meter polls every 30s). |
| What triggers a wall | Conversation tokens approach 200K. Compaction triggers automatically; the session continues with summarized history. | The five_hour or seven_day utilization float crosses 1.0. Anthropic returns 429 on the next request from your org until the bucket bleeds back down. |
| Effect of compacting | Frees up tokens in the conversation. The 200K wall is now further away. | Spends more of it. The summarization call is a billed prompt; per Anthropic, “automatic context management consumes more of your usage limit.” The post-compaction turns also miss the prompt cache, paying full input rates. |
| Effect of /clear | Resets the context window to empty. No 200K problem. | No effect. The rolling 5-hour and 7-day buckets are server-side; they do not see /clear. |
| Time band | Lifetime of one conversation. A new chat starts at zero tokens. | Sliding 5 hours (resets continuously as old prompts age out) and sliding 7 days (resets gradually). |
| Does ccusage see it? | Indirectly. ccusage totals input + output tokens per session, so a long session shows up as a large token count. | No. The server-side weighting, peak-hour multiplier, and browser-chat usage are not in ~/.claude/projects/*.jsonl. |
The 200K wall is local. The 429 wall is account-wide.
Picture two doors in Claude Code, one in front of the other. The first door is the 200K-token context window. It is the size of one conversation. When this single chat's prompt + response + file context approaches that ceiling, the model can no longer fit everything. Auto-compact triggers; the model summarizes the older turns and the conversation continues with a shorter prefix. A new chat starts at zero tokens, so this door resets every time you run /clear or open a new project.
The second door is plan quota. It does not care how many tokens are in your current conversation. It cares about the total weighted compute your whole Anthropic account has spent in the last 5 hours and the last 7 days, summed across every Claude Code instance, every browser tab on claude.ai, every IDE extension, every machine you are signed in on. Anthropic publishes this as JSON at /api/organizations/{org_uuid}/usage, the same endpoint that powers claude.ai/settings/usage. When the float in five_hour or seven_day crosses 1.0, every further request from your org returns 429 until the bucket bleeds back down.
A new chat does nothing to this door. /clear does nothing to it. Restarting Claude Code does nothing to it. Logging into a different machine does nothing to it. The buckets are server-side and per-account.
The data shape: there is no “context_window” field
Here is the actual JSON Anthropic returns when you, your browser, or a tool like ClaudeMeter calls /api/organizations/{org_uuid}/usage using the cookie session you already have from claude.ai:
Notice what is not there. There is no context_window field. There is no “tokens left in this chat” field. There is no “auto-compact threshold” field. The 200K context window is the model's problem, not the account's, and the account-level quota endpoint genuinely has nothing to say about it. The Rust struct that backs ClaudeMeter has the same shape:
That is the whole data model on the menu-bar side. The 5-hour float, the 7-day float, the per-model sub-buckets, the extra-usage block. Nothing about the conversation length, because conversation length is a different problem solved at a different layer. If a tool claims to show both your context-window position and your plan-quota position in one number, it is doing something synthetic; the server does not return them as the same thing.
The trick everyone misses: compaction itself spends quota
The most common pop misconception about /compact is that it “saves usage” or “extends the limit.” It does neither. Compaction is implemented as another LLM call: the model is asked to summarize the conversation so far, and the summary replaces the original transcript. That summarization call counts against your plan quota the same way any other prompt does. Anthropic states this directly:
“Longer conversations that trigger automatic context management consume more of your usage limit.”
Anthropic Help Center, How do usage and length limits work
You can watch this happen on the server in real time. ClaudeMeter polls /api/organizations/{org_uuid}/usage every 30 seconds (POLL_INTERVAL at src/bin/menubar.rs:18). Trigger a /compact in one window and watch the menu bar in another. The seven_day_opus float ticks up, often by 1 to 2 percent on a heavy conversation, because Opus is weighted more heavily than Sonnet against the weekly cap and the summarization runs on the same model you were using.
A real /compact event measured against the server-truth quota
62% of the rolling 5-hour bucket used. 83% of the weekly Opus bucket used. The conversation is 198K tokens, about to hit the 200K context wall. The reader assumes /compact will free up room and that is the end of the cost.
- Context: 198K of 200K. Auto-compact warning fires.
- Plan quota: five_hour 62%, seven_day_opus 83%.
- ccusage shows the session at 4.1M tokens.
The takeaway is not “don't compact.” The takeaway is that compaction is a quota tax you pay to keep the conversation alive past the 200K wall. It is the right move when you genuinely need the earlier turns in summary form. It is the wrong move when you only need the last few turns; in that case /clear and a fresh chat is free against plan quota and free against the context window.
Watching it happen on the menu bar
Here is a real session, two terminals open: Claude Code on the left, ClaudeMeter status on the right. Notice that the server-truth quota numbers move while the local conversation is being compacted, and the move is in the wrong direction for “saving” quota.
The /compact freed 186K tokens of context window, which was real and necessary. It also spent 1.2% of the weekly Opus bucket and 2.7% of the rolling 5-hour bucket on the way. If you only had the context-left indicator to look at, you would think compaction was pure profit. The plan-quota numbers are the half of the picture Claude Code does not show you.
The cache-miss tail that nobody mentions
Anthropic's prompt cache keys on an exact prefix match of the input tokens. As long as your conversation prefix is stable between turns, you pay heavily discounted cache-hit rates on the re-sent context. That is the only reason a long Claude Code session is economically tolerable; on a 198K-token transcript, the un-cached input cost would be brutal every turn.
/compact rewrites the conversation prefix. The new prefix (a summary plus the most recent turns) does not match anything in the cache. The next 3 to 5 turns post-compaction therefore get cache misses, billing at the full input-token rate. This shows up on the server as another 1 to 2 percent dent in the weekly bucket on top of the summarization itself, just from the cache-miss tail.
ccusage cannot disambiguate this from any other usage; in JSONL all input tokens look the same. ClaudeMeter cannot disambiguate it either, only the server can, and the server does not surface cache-hit/cache-miss attribution per turn. But the aggregate shows up on the seven_day float, and you can correlate it temporally with the compaction event by watching the menu bar.
So which one should you actually watch?
Both, on different timescales. The 200K context window is a per-chat concern; you keep an eye on it inside Claude Code via the “context left” indicator and you act on it with /compact or /clear. The plan quota is a per-week concern; you keep an eye on it via claude.ai/settings/usage or a tool that polls the same endpoint, and you act on it by switching models, deferring work to off-peak hours (Anthropic's peak-hour multiplier on the 5-hour bucket runs 5 to 11 a.m. Pacific weekdays per their March 2026 statement), or topping up extra-usage credits.
The trap is treating one as a substitute for the other. Compacting does not buy you plan quota; clearing does not buy you context; starting a new chat does not reset your weekly bucket; topping up extra-usage does not give you a bigger context window. Each lever moves a different ceiling.
ClaudeMeter is open source and MIT licensed; the relevant code is in src/models.rs (data shape), src/api.rs (the GET to /api/organizations/{org_uuid}/usage), and src/bin/menubar.rs (the 30-second polling loop). It only knows about plan quota, not context window, on purpose, those are different problems and lumping them into one indicator was the original confusion this page is trying to undo. Repo: github.com/m13v/claude-meter.
The honest caveat
The percentage numbers in the example session above are typical for an Opus-heavy refactor session, not a guaranteed measurement; the exact compaction cost depends on conversation length, model in use, and Anthropic's internal weighting which is not published as a formula. The directional claim, that /compact consumes plan quota rather than refunding it, is from Anthropic's own help center. The /api/organizations/{org_uuid}/usage endpoint is undocumented; the published 200K context window and the published 5-hour / 7-day plan-quota numbers come from Help Center articles, not API contracts. The only thing you can fully trust is the float in the JSON, and that is what claude-meter pins to the menu bar.
Confused about which limit you are actually hitting? I will look at it with you.
15 minutes. Walk me through your Claude Code week. I will tell you whether the wall you keep hitting is the 200K context window, the rolling 5-hour, the weekly Opus sub-bucket, or extra-usage spillover, and what to switch to so you stop spending quota on the wrong door.
Frequently asked questions
Are context compression and plan quota the same thing?
No. They are independent systems that both look like “Claude is running out of room,” which is why people conflate them. Context compression (auto-compact, /compact) summarizes earlier turns of a single conversation as it approaches the 200K-token context window. Plan quota is the rolling 5-hour and 7-day budgets Anthropic enforces against your whole Anthropic account, across every device, browser tab, and Claude Code session. The compression mechanic lives in the model server's prompt-construction step; the quota mechanic lives in a different server-side check that 429s your request based on a JSON Anthropic returns at /api/organizations/{org_uuid}/usage.
Does running /compact save plan quota?
No, it costs more of it. Anthropic's help center is explicit: “Longer conversations that trigger automatic context management consume more of your usage limit.” Compaction is itself an LLM call (the model summarizes your transcript), so it bills against the same rolling 5-hour and 7-day buckets. Compaction is good for keeping a long Claude Code session coherent, not for saving quota. If anything, it is a quota tax you accept in exchange for not hitting the 200K-token wall.
Why does the prompt cache disappear after a compaction?
Prompt caching keys on an exact prefix match of the input tokens. When /compact rewrites the conversation history into a summary, the prefix changes, so the next several turns get cache misses and pay full input-token rates. On a long Claude Code session with file context, that can mean tens of thousands of uncached input tokens flowing through the model right after compaction, none of which would have happened on the same conversation un-compacted.
If compaction debits plan quota, can I see it happen on the server?
Yes. Open claude.ai/settings/usage in a browser tab and watch the seven_day percentage and (on Max) seven_day_opus and seven_day_sonnet sub-buckets before and after a /compact. The float ticks up. ClaudeMeter polls /api/organizations/{org_uuid}/usage every 30 seconds (POLL_INTERVAL at src/bin/menubar.rs:18) so a compaction event shows up in the menu bar within a minute. The Anthropic-internal weighting on Opus prompts means an Opus compaction can move seven_day_opus by a measurable percentage on a single event.
When does Claude Code auto-compact?
Around 95% of the 200K-token context window, with about 20% of headroom reserved for the summarization call itself. The /compact slash command is the manual trigger. Either way it is the same mechanic: summarize earlier turns into a shorter recap, replace the original transcript with the recap plus the most recent turns. None of this touches the rolling 5-hour or 7-day buckets directly; the side effect is that the summarization itself is a billed prompt that does.
If my context fills before my plan quota does, am I being inefficient?
Not necessarily. Context window and plan quota measure different shapes of waste. A single very-long Claude Code session can blow through context before it dents the weekly bucket, especially on Pro. A bunch of short sessions can blow through the weekly bucket without any of them ever needing compaction. The right move depends on which one is your actual ceiling that week. ClaudeMeter shows the weekly float so you can tell which limit is the real constraint; the in-session /usage dump shows the context window so you can tell when you are near the 200K wall.
Why does ccusage not show this?
ccusage walks ~/.claude/projects/*.jsonl and totals input_tokens + output_tokens. The compaction summarization call writes to that JSONL like any other turn, so ccusage sees its raw token cost. What ccusage cannot see is the server-side weighting that turns those tokens into a fraction of seven_day or seven_day_opus, the peak-hour multiplier on the rolling 5-hour bucket, the prompt-cache miss penalty on the post-compaction turns, or browser-chat usage that depleted the same buckets without writing to JSONL. ccusage answers “how much did this machine spend?” ClaudeMeter answers “how close am I to the wall the server is going to enforce?”
Can I just disable auto-compaction to avoid the quota cost?
Some users have asked for a toggle (anthropics/claude-code#9540) but it is not currently exposed. The trade-off without compaction is harder: you hit the 200K context wall and the session stops cold. The practical workflow most heavy users settle on is /clear at phase boundaries (cheap, no LLM call) instead of /compact, and only let auto-compact run when you genuinely need to keep the prior turns in summarized form.
Does Claude Pro get auto-compaction the same way Max does?
Yes, the mechanic is identical; only the plan quota numbers differ. The 200K context window is the same on Pro and Max. The seven_day rolling bucket is roughly 40 to 80 hours per week on Pro, 140 to 280 on Max 5x, 240 to 480 on Max 20x, so a single compaction event eats a larger percentage of the Pro week than the Max week. On Pro you feel compaction's quota cost more, but only because there is less quota to start with.
Where in claude-meter is the code that reads the plan quota?
The Window struct at src/models.rs:3-7 is the data shape (utilization float, optional resets_at). The UsageResponse struct at src/models.rs:18-28 is what /api/organizations/{org_uuid}/usage returns: five_hour, seven_day, seven_day_sonnet, seven_day_opus, seven_day_oauth_apps, plus the extra_usage block. The menu bar reads it on POLL_INTERVAL=30s (src/bin/menubar.rs:18) and the browser extension reads it on POLL_MINUTES=1m (extension/background.js). All open source, MIT, github.com/m13v/claude-meter.
Keep reading
Rolling 5-hour vs weekly quota: same JSON, different walls
The 5-hour and 7-day are sibling fields on the same /usage payload. Either one independently 429s your account.
Claude Code rolling 5-hour usage: three ledgers, three answers
Built-in /usage prints a snapshot. ccusage reads local JSONL. The float that 429s your loop is on claude.ai's server. Which tool reads which.
ClaudeMeter vs ccusage
ccusage measures local Claude Code tokens off disk. ClaudeMeter measures plan quota off claude.ai. Different questions, different answers.