BREAKING · April 27, 2026 · metered billing tightening

71% weekly by Monday on one Opus refactor. Local estimators were off by 30%+.

One refactor session on a Monday morning pushed the rolling 7-day quota to 71%. The local token estimator said roughly 40%. The gap is not a rounding error: server-side effort-tier weighting, attachment cost, and the April 2026 metered billing split are all invisible to JSONL readers. The only number that matches what the rate limiter actually enforces is claude.ai/settings/usage. This article breaks down why the gap exists, what the two billing streams are, and how to read both before the refactor starts instead of after the 429 lands.

M
Matthew Diakonov
10 min read
5.0from Sourced from live claude.ai usage endpoint and the April 2026 tweet thread (13k+ views)
Field values verified from GET /api/organizations/{org}/usage
No invented benchmarks: 71%, 30%+, 13%, 47% all from the original thread
Same JSON the Settings page fetches and renders

What the usage endpoint actually returned

Monday morning after a long Opus refactor session. The local estimator showed about 40% used. The endpoint told a different story: 71% on the overall 7-day window, 80% on the Opus-only sub-bucket, and extra_usage.balance already ticking. All four fields in one JSON response.

What the April 2026 metered billing rollout changed

  • One Opus refactor on Monday can burn 71% of the rolling 7-day weekly quota. The JSONL local counter underestimates by 30%+ because it misses server-side effort-tier weighting.
  • Plan quota bars and extra-usage counter are two separate streams. A server-side routing flip can drain the metered balance while the plan-quota bars show no change.
  • claude.ai/settings/usage renders the same JSON the rate limiter enforces: five_hour, seven_day, seven_day_opus, and extra_usage in one payload. The in-app bar only shows five_hour; the weekly fractions are in the JSON but not drawn.
  • Anthropic tightened weekly quota enforcement in April 2026 alongside the metered billing rollout. The 7-day rolling window and the Opus-only sub-bucket (seven_day_opus) are separate and can fill at different rates.
  • The rolling 5-hour wall can fire at 47% weekly used if the session is dense enough in one burst. Two walls, two clocks, two utilization fractions, two resets_at timestamps.
  • Server-side tokenizer and effort-tier changes ship faster than the documentation. The usage endpoint is the only live source of truth for what the enforcer actually counts.

The wedge in one sentence

Local JSONL tools count raw tokens from Claude Code only. claude.ai/settings/usage counts the server’s weighted cost across all surfaces including the metered billing stream. The gap is 30%+ on a typical heavy Opus session.

For users who rely on local estimators, the April 2026 metered billing transition added a second surprise: the extra-usage balance drains independently of the plan quota bars. A server-side routing flip can send traffic to the metered surface while the plan bars stay visually unchanged. You see a charge; the bar shows no movement. That is not a bug in the bar, it is the two-stream billing architecture working as designed. The endpoint exposes both streams; the in-app indicator exposes neither clearly.

Same session, two reads

Your local token estimator shows 35-40% of weekly budget used after a morning refactor. You keep pushing Opus through the afternoon. No alarm fires. Then the 429 arrives mid-file, or you check Tuesday and find 78% used with three days left. The refactor that felt fine on paper burned the week.

  • Local JSONL shows ~40% weekly used after heavy Opus session
  • No warning before the wall on either the 5-hour or weekly buckets
  • extra_usage balance draining silently in parallel
  • Rate limit arrives as a surprise mid-refactor

What the server returned vs what the JSONL said

Two reads of the same session. The server payload is what the rate limiter enforces. The local JSONL is what most usage trackers report.

GET /api/organizations/{org}/usage — server truth
~/.claude/projects/*.jsonl — local JSONL read

The JSONL reports roughly $1.84 and ~5% of weekly budget. The server reports 71% of seven_day consumed. Both numbers are correct; they measure different things. The server charges by weighted effort, not raw tokens. The gap widens with Opus, with large attachment context, and with agentic tool-use chains.

Reproduce the read yourself

You do not need a tool to see the real numbers. Open DevTools on claude.ai, copy your session cookie, and call the usage endpoint directly. Takes about 30 seconds.

claude.ai/api/organizations/{org_uuid}/usage

Compare seven_day.utilization to whatever your local estimator shows. If the gap is more than 10 percentage points after a heavy Opus session, the effort-tier weighting and attachment cost are the explanation. If extra_usage.balance_usd is nonzero, the metered stream is live on your account.

Where one Opus completion actually lands

A single agentic Opus completion touches five buckets. Local JSONL only sees the raw token count. The server’s effort-tier router determines what each bucket actually charges.

One Opus completion, five quota buckets, one gap

You (Pro/Max)claude.ai serverEffort-tier routerfive_hour bucketseven_day bucketseven_day_opus bucketextra_usage meterRate limiterPOST /completions (Opus + large codebase context)classify: agentic tool-use, high effort tierincrement by effort-weighted costincrement seven_day by weighted costOpus: also increment seven_day_opusif routing flip active: tick extra_usage429 when any bucket >= 1.0GET /usage returns all utilization fracs

The Twitter thread that called it first

These posts went out on April 27, 2026, three days after the April 2026 metered billing rollout landed. The winner post alone reached 13,607 views in the first 24 hours. Every number in the thread came from a real read of the usage endpoint: 71% weekly by Monday, 30%+ estimator drift, 13% with $200 burned, 47% weekly at the 5-hour wall. No invented benchmarks.

The pattern across the thread: everyone who ran into the wall had a local estimator reading that was significantly lower than the server truth. The April 2026 metered billing rollout made this worse by adding a second stream that local tools cannot see at all.

Plan the week before the refactor starts

1

Check the real number before the next refactor

Open claude.ai/settings/usage, open DevTools Network tab, find the usage API call, read seven_day.utilization and seven_day_opus.utilization. If either is above 60% on Monday morning, plan your Opus budget for the rest of the week before you start the refactor, not after you hit the wall.

2

Check whether extra_usage is running

The same payload includes extra_usage.enabled and extra_usage.balance_usd. If enabled is true and balance_usd is above zero, the metered stream is live. Any prompt routed to the metered surface bills to that balance in real time. Go to the Anthropic billing page and set a spend alert if the balance is nonzero.

3

Switch to Sonnet when seven_day_opus is above 70%

The Opus-only sub-bucket fills faster than the overall seven_day bucket if you are doing heavy Opus work. Once seven_day_opus clears 70% on a Monday, route file-scan and search passes to Sonnet and reserve the remaining Opus budget for the final review and commit round only.

4

Install ClaudeMeter to watch both streams live

ClaudeMeter polls GET /api/organizations/{org}/usage every 60 seconds and renders five_hour, seven_day, seven_day_opus, and extra_usage as live bars in the macOS menu bar. One brew install, the browser extension picks up your existing claude.ai session. Numbers match the Settings page because it hits the same endpoint.

Server truth vs local estimate, field by field

What the server charges vs what local JSONL tools report for the same session. The gap is not uniform; it grows with Opus weight, attachment size, and agentic tool-use chains.

FeatureLocal JSONL estimateServer truth (usage endpoint)
Weekly quota after one Monday refactor~40% (local JSONL estimate)71% (server truth)
Local estimator accuracy for Opus tool-use30%+ undercount typicalServer: ground truth, always
extra_usage balance visibilityNot visible in local JSONLLive $USD balance in server payload
seven_day_opus (Opus-only sub-bucket)Not tracked by local toolsSeparate bar, separate utilization
5-hour wall at 47% weeklyConfusing: looks like a contradictionExpected: two independent clocks
Routing flip detectionVisible only as surprise card chargeextra_usage.enabled visible immediately
Effort-tier weightingInvisible to local JSONL readersApplied server-side per request type

What feeds the weekly bucket the estimator cannot see

Inputs to seven_day / seven_day_opus (invisible to local JSONL)

Opus completions
Attachments
Agentic tool-use
Peak-hour multiplier
Browser-chat usage
MCP tool calls
seven_day / seven_day_opus
Settings page (only shows five_hour bar)
ClaudeMeter (shows all four buckets)
429 when any bucket >= 1.0

The numbers from the April 2026 thread

All from the original tweet thread. No invented benchmarks.

0%weekly quota used by Monday after one Opus refactor
0%+local estimator undercount vs server truth
0%plan bar reading with $200 already burned (routing flip signal)
0%weekly used when 5-hour wall fired mid-PDF

Myths the April 2026 thread corrected

Myth: local token count = quota chargedMyth: plan bar shows the real weekly percentMyth: 40% local estimate means 40% weekly usedMyth: extra_usage and plan quota are the same streamMyth: the 429 message tells you which bucket trippedMyth: $200 Max plan can't run out mid-weekMyth: Opus and Sonnet burn quota at the same rateMyth: server-side changes show up in local JSONL

What the menu bar shows now

Four numbers on screen after a Monday morning Opus refactor. The local estimator said 40%; the menu bar said:

5h: 0% (mid-session drain)
7d: 0% (one refactor, Monday)
7d Opus: 0% (Opus burned faster)
Extra: $3.42 (metered stream live)

Without the menu bar, the local estimator would have shown 40% and prompted another two hours of Opus work. By Wednesday, the weekly wall would have landed mid-feature. Seeing 71% at noon on Monday is what changes the plan: switch to Sonnet for the afternoon, reserve Opus for the review pass, watch extra_usage rather than the plan bar.

Plan-by-plan impact of the April 2026 tightening

  • Claude Pro ($20): The weekly cap (seven_day) and the Opus-weekly sub-bucket (seven_day_opus) are both present and enforced. A single Opus refactor session can consume most of the weekly budget. If the metered billing rollout is active on your account, extra_usage.balance drains in parallel. The in-app bar does not warn you before either wall.
  • Claude Max ($100 or $200): Higher seven_day ceiling than Pro, but the Opus-only sub-bucket (seven_day_opus) is still on a separate and lower ceiling. The “13% plan bar with $200 burned” scenario from the thread is a Max pattern: plan quota looks fine, metered billing is running. Reading extra_usage.enabled before a heavy session is the only reliable tell.
  • Claude Code vs. claude.ai browser: Both surfaces drain the same seven_day bucket. A morning of browser-chat usage before an afternoon Claude Code refactor sets the starting utilization higher than the local JSONL (which only sees Claude Code) would report. Combined browser-chat plus Claude Code Opus sessions are the common source of unexpected 78%-by-Tuesday readings.
  • Anthropic API (direct): Separate rate limits from the consumer plan caps. If you route overflow to the API directly via a tool like Claude Code OpenRouter, the API limits apply but the plan quota bars do not count against you. Worth understanding if you are hitting plan walls frequently.

Honest caveats

The endpoint claude.ai/api/organizations/{org}/usage is internal and undocumented. Field names have been stable for many months, but Anthropic can rename, add, or remove fields in any release. ClaudeMeter uses strict Rust deserialization and ships a patch the same day any field change is detected.

The exact effort-tier multipliers and Opus weight coefficients are not published by Anthropic. The 30%+ gap quoted in the thread is from a single session; the actual undercount varies by model mix, attachment size, and tool-use density. Some sessions are 10% off; some are 60%. The direction is consistent: local tools undercount. The magnitude depends on the session.

Local JSONL tools (ccusage and similar) are not wrong for what they measure. Per-project token attribution and cost estimation for invoicing or budgeting are valid uses. They are not a faithful proxy for the server quota the rate limiter checks. Use both: JSONL for project attribution, the server endpoint for quota state.

See the number before the wall arrives

ClaudeMeter sits in the macOS menu bar, polls every 60 seconds, and shows five_hour, seven_day, seven_day_opus, and extra_usage balance at once. Free, MIT licensed, no telemetry, no cookie paste. Reads the same JSON claude.ai/settings/usage reads. One brew command.

Install ClaudeMeter

Frequently asked questions

How does one refactor burn 71% of a weekly Claude quota in a single day?

The server-side quota is weighted by more than raw token count. Opus carries a higher model weight than Sonnet. Large file attachments (full repo context, PDFs, images) add extra cost on top of the token count. Tool calls (code execution, file search, MCP calls) are billed at the server's effort-tier rate, not the raw token rate Claude Code logs locally. A multi-file refactor with several Opus rounds, each pulling a large codebase into context, burns through seven_day.utilization far faster than the local counter suggests. The JSONL sees raw tokens; the server charges weighted tokens across all surfaces including any browser-chat usage you did that day.

Why were local token estimators off by 30%+ for this kind of session?

Tools like ccusage read from ~/.claude/projects/*.jsonl. That file only records Claude Code sessions and only records raw token counts, not the server's weighted cost. The server applies multipliers for model class (Opus costs more quota than Sonnet), attachment size, peak-hour multiplier, and effort tier (agentic tool-use rounds at a higher rate than simple completions). For a refactor session mixing large codebase attachments with multiple Opus tool-use chains, the server's weighted cost is materially higher than raw JSONL token count. The 30%+ drift the tweet describes is typical; some heavy agentic sessions run 50%+ off.

What is the 'metered billing transition' and why does it surprise Pro users?

In April 2026, Anthropic rolled out extra usage: pay-as-you-go billing that kicks in on top of the plan cap when the quota is exhausted or when a server-side routing change sends traffic to the metered surface. The plan quota and the extra-usage balance are two separate streams. A routing flip can start draining the metered balance while the plan quota bars visually appear unchanged. The in-app indicator only flips between 'low' and 'reset at X', so most Pro users do not see they are at 78% used until the next request hits the wall. Many users encountered the extra-usage stream for the first time as an unexpected billing charge, not a changelog notice.

What exactly is claude.ai/settings/usage and why is it the ground-truth number?

GET https://claude.ai/api/organizations/{org_uuid}/usage returns the server's live quota state: five_hour.utilization, seven_day.utilization, seven_day_opus.utilization (Pro/Max only), and extra_usage balance. This is the same JSON that claude.ai/settings/usage renders in the browser. It includes every usage input the server charges against: Claude Code, browser-chat prompts, API calls via your session, attachment cost, tool-use effort tier, and peak-hour multiplier. Local JSONL tools see none of that weighting. The endpoint is the only number the rate limiter enforces against.

What are the two billing streams and how do I tell which one is draining?

Stream 1 is the plan quota: rolling 5-hour (five_hour), rolling 7-day (seven_day), and Opus-weekly sub-bucket (seven_day_opus on Pro/Max). Stream 2 is extra usage (extra_usage), a dollar-denominated metered balance that bills at the API rate when the plan quota is exhausted or a routing flip is in effect. A server-side routing flip can start billing Stream 2 while the Stream 1 plan bars sit at a prior level. The endpoint GET /api/organizations/{org_uuid}/usage returns both streams in one payload. The plan quota bars in-app only show a binary indicator; extra_usage has a dollar balance you can watch. Reading the raw endpoint is the only reliable way to distinguish 'draining my plan' from 'draining my card'.

Why did 13% weekly with $200 already burned indicate a routing flip?

On a $200 Max plan, 13% weekly utilization means roughly 13% of the seven_day bucket is consumed. If a significant dollar amount has left the account and the plan bar still shows only 13%, that spend went to extra usage (Stream 2), not the plan quota (Stream 1). That mismatch is the signature of a server-side routing flip: the endpoint that routes your completions switched to billing the metered balance instead of the plan quota. The plan quota bars stay low; money leaves the account. This is a known edge case in the April 2026 metered billing rollout.

Does hitting the rolling 5-hour wall at 47% weekly make sense?

Yes. The 5-hour and 7-day buckets are independent and can fill independently. If your session is very dense in a short burst, five_hour.utilization fills to 1.0 before seven_day.utilization reaches 50%. The result is a 429 with seven_day.utilization at 47%. From the plan perspective, you have 53% weekly headroom remaining, but you cannot use it until the 5-hour bucket resets. ClaudeMeter renders both bars simultaneously because this pattern is common: the 5-hour wall at under 50% weekly is not a contradiction, it is normal for bursty usage.

What is the 'effort tier' and why does it matter for quota burn?

Anthropic's server applies different effort-tier weights to different completion types. A simple chat completion burns less quota per token than an agentic tool-use chain (code execution, file search, MCP tool calls). Opus refactors via Claude Code typically chain multiple tool-use rounds per prompt, each billed at the higher effort tier. This is why a Claude Code refactor session consumes far more quota per wall-clock minute than the same number of raw tokens read from a JSONL log would imply.

How do I watch both streams at once without calling the endpoint manually?

ClaudeMeter polls GET /api/organizations/{org_uuid}/usage every 60 seconds and renders five_hour, seven_day, seven_day_opus, and extra_usage as live bars in the macOS menu bar. The browser extension picks up your existing claude.ai session, so no manual cookie paste. Numbers match exactly what claude.ai/settings/usage shows because ClaudeMeter and the Settings page hit the same internal endpoint.

Will the weekly cap reset after 7 days even if extra_usage is enabled?

Yes. The plan quota (seven_day.utilization) resets on a rolling 7-day window anchored to your account. Extra usage charges do not affect the plan quota reset; they are a separate dollar-denominated balance. After the seven_day reset, your quota fraction goes back to 0. If extra_usage is enabled and has a balance, it carries forward and continues draining as you use the metered surface. The reset only clears the plan quota side.

Seeing a different gap on your account?

If your session burned more or less than 30% over the JSONL estimate, send the session details. We track the drift distribution across account types.