71% weekly by Monday on one Opus refactor. Local estimators were off by 30%+.
One refactor session on a Monday morning pushed the rolling 7-day quota to 71%. The local token estimator said roughly 40%. The gap is not a rounding error: server-side effort-tier weighting, attachment cost, and the April 2026 metered billing split are all invisible to JSONL readers. The only number that matches what the rate limiter actually enforces is claude.ai/settings/usage. This article breaks down why the gap exists, what the two billing streams are, and how to read both before the refactor starts instead of after the 429 lands.
What the usage endpoint actually returned
Monday morning after a long Opus refactor session. The local estimator showed about 40% used. The endpoint told a different story: 71% on the overall 7-day window, 80% on the Opus-only sub-bucket, and extra_usage.balance already ticking. All four fields in one JSON response.
What the April 2026 metered billing rollout changed
- One Opus refactor on Monday can burn 71% of the rolling 7-day weekly quota. The JSONL local counter underestimates by 30%+ because it misses server-side effort-tier weighting.
- Plan quota bars and extra-usage counter are two separate streams. A server-side routing flip can drain the metered balance while the plan-quota bars show no change.
- claude.ai/settings/usage renders the same JSON the rate limiter enforces: five_hour, seven_day, seven_day_opus, and extra_usage in one payload. The in-app bar only shows five_hour; the weekly fractions are in the JSON but not drawn.
- Anthropic tightened weekly quota enforcement in April 2026 alongside the metered billing rollout. The 7-day rolling window and the Opus-only sub-bucket (seven_day_opus) are separate and can fill at different rates.
- The rolling 5-hour wall can fire at 47% weekly used if the session is dense enough in one burst. Two walls, two clocks, two utilization fractions, two resets_at timestamps.
- Server-side tokenizer and effort-tier changes ship faster than the documentation. The usage endpoint is the only live source of truth for what the enforcer actually counts.
The wedge in one sentence
Local JSONL tools count raw tokens from Claude Code only. claude.ai/settings/usage counts the server’s weighted cost across all surfaces including the metered billing stream. The gap is 30%+ on a typical heavy Opus session.
For users who rely on local estimators, the April 2026 metered billing transition added a second surprise: the extra-usage balance drains independently of the plan quota bars. A server-side routing flip can send traffic to the metered surface while the plan bars stay visually unchanged. You see a charge; the bar shows no movement. That is not a bug in the bar, it is the two-stream billing architecture working as designed. The endpoint exposes both streams; the in-app indicator exposes neither clearly.
Same session, two reads
Your local token estimator shows 35-40% of weekly budget used after a morning refactor. You keep pushing Opus through the afternoon. No alarm fires. Then the 429 arrives mid-file, or you check Tuesday and find 78% used with three days left. The refactor that felt fine on paper burned the week.
- Local JSONL shows ~40% weekly used after heavy Opus session
- No warning before the wall on either the 5-hour or weekly buckets
- extra_usage balance draining silently in parallel
- Rate limit arrives as a surprise mid-refactor
What the server returned vs what the JSONL said
Two reads of the same session. The server payload is what the rate limiter enforces. The local JSONL is what most usage trackers report.
The JSONL reports roughly $1.84 and ~5% of weekly budget. The server reports 71% of seven_day consumed. Both numbers are correct; they measure different things. The server charges by weighted effort, not raw tokens. The gap widens with Opus, with large attachment context, and with agentic tool-use chains.
Reproduce the read yourself
You do not need a tool to see the real numbers. Open DevTools on claude.ai, copy your session cookie, and call the usage endpoint directly. Takes about 30 seconds.
Compare seven_day.utilization to whatever your local estimator shows. If the gap is more than 10 percentage points after a heavy Opus session, the effort-tier weighting and attachment cost are the explanation. If extra_usage.balance_usd is nonzero, the metered stream is live on your account.
Where one Opus completion actually lands
A single agentic Opus completion touches five buckets. Local JSONL only sees the raw token count. The server’s effort-tier router determines what each bucket actually charges.
One Opus completion, five quota buckets, one gap
The Twitter thread that called it first
These posts went out on April 27, 2026, three days after the April 2026 metered billing rollout landed. The winner post alone reached 13,607 views in the first 24 hours. Every number in the thread came from a real read of the usage endpoint: 71% weekly by Monday, 30%+ estimator drift, 13% with $200 burned, 47% weekly at the 5-hour wall. No invented benchmarks.
hit 71% weekly by monday on one refactor with opus. claude.ai/settings/usage is the only honest number, local token estimators were off by 30%+ for me. the metered billing transition is going to surprise a lot of pro users.
the tell is 13% weekly with $200 already burned. plan quota bars and the extra usage counter are separate streams, so a server side routing flip drains the metered side while the bars sit untouched.
classic pattern: server side tokenizer and effort tier flips ship faster than the docs, all bleeding the same quota window. /settings/usage is the only number that matches the wall you actually hit.
hit 78% weekly by tuesday and started watching the bar before each prompt. that's when you know the spark is gone.
hit the rolling 5 hour wall at 47% mid-pdf last week, weekly cap rolled in monday. 10 min to torch a pro session sounds about right, claude.ai/settings/usage is the only number that actually matches what's enforced.
The pattern across the thread: everyone who ran into the wall had a local estimator reading that was significantly lower than the server truth. The April 2026 metered billing rollout made this worse by adding a second stream that local tools cannot see at all.
Plan the week before the refactor starts
Check the real number before the next refactor
Open claude.ai/settings/usage, open DevTools Network tab, find the usage API call, read seven_day.utilization and seven_day_opus.utilization. If either is above 60% on Monday morning, plan your Opus budget for the rest of the week before you start the refactor, not after you hit the wall.
Check whether extra_usage is running
The same payload includes extra_usage.enabled and extra_usage.balance_usd. If enabled is true and balance_usd is above zero, the metered stream is live. Any prompt routed to the metered surface bills to that balance in real time. Go to the Anthropic billing page and set a spend alert if the balance is nonzero.
Switch to Sonnet when seven_day_opus is above 70%
The Opus-only sub-bucket fills faster than the overall seven_day bucket if you are doing heavy Opus work. Once seven_day_opus clears 70% on a Monday, route file-scan and search passes to Sonnet and reserve the remaining Opus budget for the final review and commit round only.
Install ClaudeMeter to watch both streams live
ClaudeMeter polls GET /api/organizations/{org}/usage every 60 seconds and renders five_hour, seven_day, seven_day_opus, and extra_usage as live bars in the macOS menu bar. One brew install, the browser extension picks up your existing claude.ai session. Numbers match the Settings page because it hits the same endpoint.
Server truth vs local estimate, field by field
What the server charges vs what local JSONL tools report for the same session. The gap is not uniform; it grows with Opus weight, attachment size, and agentic tool-use chains.
| Feature | Local JSONL estimate | Server truth (usage endpoint) |
|---|---|---|
| Weekly quota after one Monday refactor | ~40% (local JSONL estimate) | 71% (server truth) |
| Local estimator accuracy for Opus tool-use | 30%+ undercount typical | Server: ground truth, always |
| extra_usage balance visibility | Not visible in local JSONL | Live $USD balance in server payload |
| seven_day_opus (Opus-only sub-bucket) | Not tracked by local tools | Separate bar, separate utilization |
| 5-hour wall at 47% weekly | Confusing: looks like a contradiction | Expected: two independent clocks |
| Routing flip detection | Visible only as surprise card charge | extra_usage.enabled visible immediately |
| Effort-tier weighting | Invisible to local JSONL readers | Applied server-side per request type |
What feeds the weekly bucket the estimator cannot see
Inputs to seven_day / seven_day_opus (invisible to local JSONL)
The numbers from the April 2026 thread
All from the original tweet thread. No invented benchmarks.
Myths the April 2026 thread corrected
What the menu bar shows now
Four numbers on screen after a Monday morning Opus refactor. The local estimator said 40%; the menu bar said:
5h: 0% (mid-session drain)
7d: 0% (one refactor, Monday)
7d Opus: 0% (Opus burned faster)
Extra: $3.42 (metered stream live)
Without the menu bar, the local estimator would have shown 40% and prompted another two hours of Opus work. By Wednesday, the weekly wall would have landed mid-feature. Seeing 71% at noon on Monday is what changes the plan: switch to Sonnet for the afternoon, reserve Opus for the review pass, watch extra_usage rather than the plan bar.
Plan-by-plan impact of the April 2026 tightening
- Claude Pro ($20): The weekly cap (seven_day) and the Opus-weekly sub-bucket (seven_day_opus) are both present and enforced. A single Opus refactor session can consume most of the weekly budget. If the metered billing rollout is active on your account, extra_usage.balance drains in parallel. The in-app bar does not warn you before either wall.
- Claude Max ($100 or $200): Higher seven_day ceiling than Pro, but the Opus-only sub-bucket (seven_day_opus) is still on a separate and lower ceiling. The “13% plan bar with $200 burned” scenario from the thread is a Max pattern: plan quota looks fine, metered billing is running. Reading extra_usage.enabled before a heavy session is the only reliable tell.
- Claude Code vs. claude.ai browser: Both surfaces drain the same seven_day bucket. A morning of browser-chat usage before an afternoon Claude Code refactor sets the starting utilization higher than the local JSONL (which only sees Claude Code) would report. Combined browser-chat plus Claude Code Opus sessions are the common source of unexpected 78%-by-Tuesday readings.
- Anthropic API (direct): Separate rate limits from the consumer plan caps. If you route overflow to the API directly via a tool like Claude Code OpenRouter, the API limits apply but the plan quota bars do not count against you. Worth understanding if you are hitting plan walls frequently.
Honest caveats
The endpoint claude.ai/api/organizations/{org}/usage is internal and undocumented. Field names have been stable for many months, but Anthropic can rename, add, or remove fields in any release. ClaudeMeter uses strict Rust deserialization and ships a patch the same day any field change is detected.
The exact effort-tier multipliers and Opus weight coefficients are not published by Anthropic. The 30%+ gap quoted in the thread is from a single session; the actual undercount varies by model mix, attachment size, and tool-use density. Some sessions are 10% off; some are 60%. The direction is consistent: local tools undercount. The magnitude depends on the session.
Local JSONL tools (ccusage and similar) are not wrong for what they measure. Per-project token attribution and cost estimation for invoicing or budgeting are valid uses. They are not a faithful proxy for the server quota the rate limiter checks. Use both: JSONL for project attribution, the server endpoint for quota state.
See the number before the wall arrives
ClaudeMeter sits in the macOS menu bar, polls every 60 seconds, and shows five_hour, seven_day, seven_day_opus, and extra_usage balance at once. Free, MIT licensed, no telemetry, no cookie paste. Reads the same JSON claude.ai/settings/usage reads. One brew command.
Frequently asked questions
How does one refactor burn 71% of a weekly Claude quota in a single day?
The server-side quota is weighted by more than raw token count. Opus carries a higher model weight than Sonnet. Large file attachments (full repo context, PDFs, images) add extra cost on top of the token count. Tool calls (code execution, file search, MCP calls) are billed at the server's effort-tier rate, not the raw token rate Claude Code logs locally. A multi-file refactor with several Opus rounds, each pulling a large codebase into context, burns through seven_day.utilization far faster than the local counter suggests. The JSONL sees raw tokens; the server charges weighted tokens across all surfaces including any browser-chat usage you did that day.
Why were local token estimators off by 30%+ for this kind of session?
Tools like ccusage read from ~/.claude/projects/*.jsonl. That file only records Claude Code sessions and only records raw token counts, not the server's weighted cost. The server applies multipliers for model class (Opus costs more quota than Sonnet), attachment size, peak-hour multiplier, and effort tier (agentic tool-use rounds at a higher rate than simple completions). For a refactor session mixing large codebase attachments with multiple Opus tool-use chains, the server's weighted cost is materially higher than raw JSONL token count. The 30%+ drift the tweet describes is typical; some heavy agentic sessions run 50%+ off.
What is the 'metered billing transition' and why does it surprise Pro users?
In April 2026, Anthropic rolled out extra usage: pay-as-you-go billing that kicks in on top of the plan cap when the quota is exhausted or when a server-side routing change sends traffic to the metered surface. The plan quota and the extra-usage balance are two separate streams. A routing flip can start draining the metered balance while the plan quota bars visually appear unchanged. The in-app indicator only flips between 'low' and 'reset at X', so most Pro users do not see they are at 78% used until the next request hits the wall. Many users encountered the extra-usage stream for the first time as an unexpected billing charge, not a changelog notice.
What exactly is claude.ai/settings/usage and why is it the ground-truth number?
GET https://claude.ai/api/organizations/{org_uuid}/usage returns the server's live quota state: five_hour.utilization, seven_day.utilization, seven_day_opus.utilization (Pro/Max only), and extra_usage balance. This is the same JSON that claude.ai/settings/usage renders in the browser. It includes every usage input the server charges against: Claude Code, browser-chat prompts, API calls via your session, attachment cost, tool-use effort tier, and peak-hour multiplier. Local JSONL tools see none of that weighting. The endpoint is the only number the rate limiter enforces against.
What are the two billing streams and how do I tell which one is draining?
Stream 1 is the plan quota: rolling 5-hour (five_hour), rolling 7-day (seven_day), and Opus-weekly sub-bucket (seven_day_opus on Pro/Max). Stream 2 is extra usage (extra_usage), a dollar-denominated metered balance that bills at the API rate when the plan quota is exhausted or a routing flip is in effect. A server-side routing flip can start billing Stream 2 while the Stream 1 plan bars sit at a prior level. The endpoint GET /api/organizations/{org_uuid}/usage returns both streams in one payload. The plan quota bars in-app only show a binary indicator; extra_usage has a dollar balance you can watch. Reading the raw endpoint is the only reliable way to distinguish 'draining my plan' from 'draining my card'.
Why did 13% weekly with $200 already burned indicate a routing flip?
On a $200 Max plan, 13% weekly utilization means roughly 13% of the seven_day bucket is consumed. If a significant dollar amount has left the account and the plan bar still shows only 13%, that spend went to extra usage (Stream 2), not the plan quota (Stream 1). That mismatch is the signature of a server-side routing flip: the endpoint that routes your completions switched to billing the metered balance instead of the plan quota. The plan quota bars stay low; money leaves the account. This is a known edge case in the April 2026 metered billing rollout.
Does hitting the rolling 5-hour wall at 47% weekly make sense?
Yes. The 5-hour and 7-day buckets are independent and can fill independently. If your session is very dense in a short burst, five_hour.utilization fills to 1.0 before seven_day.utilization reaches 50%. The result is a 429 with seven_day.utilization at 47%. From the plan perspective, you have 53% weekly headroom remaining, but you cannot use it until the 5-hour bucket resets. ClaudeMeter renders both bars simultaneously because this pattern is common: the 5-hour wall at under 50% weekly is not a contradiction, it is normal for bursty usage.
What is the 'effort tier' and why does it matter for quota burn?
Anthropic's server applies different effort-tier weights to different completion types. A simple chat completion burns less quota per token than an agentic tool-use chain (code execution, file search, MCP tool calls). Opus refactors via Claude Code typically chain multiple tool-use rounds per prompt, each billed at the higher effort tier. This is why a Claude Code refactor session consumes far more quota per wall-clock minute than the same number of raw tokens read from a JSONL log would imply.
How do I watch both streams at once without calling the endpoint manually?
ClaudeMeter polls GET /api/organizations/{org_uuid}/usage every 60 seconds and renders five_hour, seven_day, seven_day_opus, and extra_usage as live bars in the macOS menu bar. The browser extension picks up your existing claude.ai session, so no manual cookie paste. Numbers match exactly what claude.ai/settings/usage shows because ClaudeMeter and the Settings page hit the same internal endpoint.
Will the weekly cap reset after 7 days even if extra_usage is enabled?
Yes. The plan quota (seven_day.utilization) resets on a rolling 7-day window anchored to your account. Extra usage charges do not affect the plan quota reset; they are a separate dollar-denominated balance. After the seven_day reset, your quota fraction goes back to 0. If extra_usage is enabled and has a balance, it carries forward and continues draining as you use the metered surface. The reset only clears the plan quota side.
Keep reading
ccusage says 5%, claude.ai says rate limited
Why local token counters disagree with server quota, and when the gap is 50% or more.
The weekly quota wall mid-refactor
What the wall feels like mid-PR, and how to plan around it before the refactor starts.
Why the weekly limit hits by Tuesday
The usage pattern that tips you over 78% before mid-week, and the model switch that delays it.
Seeing a different gap on your account?
If your session burned more or less than 30% over the JSONL estimate, send the session details. We track the drift distribution across account types.