Server quota is a fraction with a private denominator. Your token counter can't see it.
What claude.ai/settings/usage renders is one field: utilization, a dimensionless scalar the server computes against a ceiling it never returns on the wire. Every tool that counts tokens from your local log has the numerator. None of them have the denominator. That is why those numbers drift from what the settings page shows, and why the only way to see what Anthropic actually enforces is to read the server's own field directly.
The one field that answers "am I about to be rate-limited"
The consumer plan's quota lives on a private endpoint at https://claude.ai/api/organizations/{org_uuid}/usage. The response is a JSON body of seven objects, each shaped the same way. That shape is a two-field struct called Window. There is no tokens_used. There is no tokens_remaining. There is no limit. The whole quota story is one floating-point number per bucket, and a timestamp for when that bucket's oldest chargeable traffic ages out.
If the struct only has two fields and neither of them is a ceiling, the ceiling is not in the response. A client that wants to compute "tokens left" has to invent the ceiling. That inventing is where every local-log tracker goes wrong.
Anchor fact: the entire quota contract fits in 0 fields
This is the complete server-side shape. No hidden sibling field, no pagination, no expansion param. Two primitives per bucket, repeated seven times in the same JSON body. Every decision claude-meter makes downstream derives from this struct.
What actually goes over the wire on one poll
One poll, three endpoints, one shared cookie
No external service is in this path. The only public host called is claude.ai, which you are already logged into. The bridge is localhost.
Why token counters structurally can't answer this
A token counter like ccusage or Claude-Code-Usage-Monitor walks your local JSONL transcript and sums inputTokens and outputTokens. That sum is a real number: "on this device, in this session, I consumed N tokens". But the thing the server enforces is not N tokens. It is utilization = f(traffic across all your devices, across all contexts, under current bucket weights). The function f is not public. Its denominator is not public. Its weights were adjusted server-side on 2026-03-26 and again on several deploys since. A local counter is a numerator, which is a useful diagnostic, but it is not the thing the server throttles on.
This is why the same account can show "1.4M tokens used" in a local tool and "97 percent" in the settings page. Both numbers are right. They answer different questions. Only the percent is the one the 429 enforces against.
The five lines that turn the server's fraction into a bar
The Chrome extension does almost nothing with the raw utilization field. It branches on whether the server sent the fraction as 0..1 or as already-scaled 0..100, and then fills a bar. The whole helper is five lines. That is the point: once you have the server's number, there is nothing clever left to do.
Three endpoints, one cookie
The full quota picture is built from these. Each one is a GET with a session cookie and a Referer: https://claude.ai/settings/usage header. All undocumented; all match what the product itself reads.
GET /api/organizations/{org}/usage
The only source of truth for rolling-window utilization. Returns seven Window objects, each with a utilization fraction and a resets_at timestamp. No tokens, no dollars, just fractions.
GET /api/organizations/{org}/overage_spend_limit
Companion endpoint for metered billing on top of the plan. Returns used_credits in cents and a monthly_credit_limit. Independent of utilization above.
GET /api/organizations/{org}/subscription_details
Plan status, next_charge_date, and the last four of the card on file. Not needed for quota, used to render 'next charge' in the menu bar.
GET /api/account
Returns your email and every organization membership. The extension iterates memberships so multi-org accounts show every quota, not just the default org.
Referer header is load-bearing
All three endpoints return 403 without Referer: https://claude.ai/settings/usage. The server checks Referer as part of the CSRF story; both claude-meter routes set it explicitly.
Where quota data flows on every poll
The hub reads three endpoints, the fan-out is what you see
The left side is the only secret: your cookie. The middle is five lines of Rust. The right side is what you read.
Exactly what the Rust caller does
Three GETs, all with the same cookie header and the same Referer. The Referer is not decorative. Drop it and every endpoint returns 403. This is the single most common reason a hand-rolled curl script fails on the first try.
Reproduce it in one curl, then watch it fail without Referer
You do not need to install anything to verify the shape. Grab your claude.ai session cookie from DevTools, export $COOKIE and $ORG, and hit the endpoint. The second call shows what happens when the Referer header is missing.
Five steps to read your real utilization
Find your org UUID.
GET https://claude.ai/api/account with your session cookie. The response has a memberships array; every entry has organization.uuid. Pick the org you care about (or iterate all of them the way the extension does in background.js lines 18-22).
Hit /usage with Cookie and Referer.
Cookie: your full claude.ai cookie. Referer: https://claude.ai/settings/usage. Accept: */*. Nothing else is required. Omit Referer and you get 403.
Read utilization, branch on <= 1.
For each Window-shaped field in the response, treat utilization <= 1 as a fraction (multiply by 100) and > 1 as already a percent. extension/background.js does this in pctFromWindow, which is five lines of logic.
Ignore tokens_used. It isn't there.
The server never returns a raw token count on this endpoint. If your tool or dashboard is displaying one, it was computed client-side and the denominator is invented. Fall back to utilization.
Poll every 60 seconds while you care.
The denominator effectively shifts as old traffic ages out of the window, so utilization drifts even with zero new messages. 60 seconds matches the cadence at which humans act on the number. Anything longer and you are reading stale state.
Server utilization vs local token counts
Both answer real questions. Only one answers the quota question. Tokens-on-disk and server-enforced utilization are not interchangeable, and pretending they are is how you get 'but I still had quota left' at 100 percent.
| Feature | Local-log token counters (ccusage, Claude-Code-Usage-Monitor) | ClaudeMeter (server utilization) |
|---|---|---|
| Knows the denominator | No. Token counters see the numerator only. | Yes. The server returns utilization directly, denominator is implicit. |
| Matches claude.ai/settings/usage byte for byte | No. Approximated from local files; off by unknown margin. | Yes. Reads the exact endpoint that page renders from. |
| Includes traffic from other devices on the account | No. Local files only cover the device they ran on. | Yes. Server aggregates across devices before computing utilization. |
| Counts OAuth-app and cowork traffic toward quota | No. Those paths never write JSONL your client can read. | Yes. seven_day_oauth_apps and seven_day_cowork are separate Window fields. |
| Updates as the rolling window slides | Partial. Recomputes from the local log; denominator guessed. | Yes. Every 60 seconds, from POLL_MINUTES = 1 in background.js. |
| Works across multiple organizations on one account | No. | Yes. Iterates /api/account.memberships and polls each org. |
| Requires a cookie paste | Varies. Several tools ask you to paste a sessionKey manually. | No. Extension uses credentials: 'include'; binary reads Chrome Safe Storage. |
| Telemetry to third parties | Varies. | None. Everything runs on localhost; the bridge binds 127.0.0.1:63762. |
What a utilization stream looks like on a normal afternoon
The wire value of utilization is what you see here. A bar renders by multiplying by 100 (if it's ≤ 1) and clamping at 100. 1.02 is legal and means you're over the ceiling for that bucket.
The honest caveats
The endpoint is undocumented. Anthropic can rename fields without warning; both the Rust struct and the extension's JSON parse would fail loudly on the next poll if they did, so the break is observable. Session cookies expire; when they do, the binary shows ! until you re-login in your browser. Safari's cookie store is not supported yet. The whole stack is macOS-only for now. And because utilization is a fraction with a private denominator, the server can change the denominator at any moment (it did on 2026-03-26) and your percent will shift without any of your behaviour changing.
See what the server actually enforces
ClaudeMeter is free, MIT-licensed, and reads the same endpoint the settings page renders from. Install the menu-bar app and the Chrome extension, and every bucket's utilization shows up with its own live reset timestamp, no cookie paste.
Need help wiring a custom caller to the /usage endpoint?
Send us a sample response and we'll help you parse it the same way ClaudeMeter does, field by field.
Frequently asked questions
Why can't a local token counter equal the quota the server enforces?
Because the server expresses quota as utilization, a dimensionless fraction, not a token count. In /src/models.rs the Window struct has exactly two fields: utilization: f64 and resets_at: Option<DateTime<Utc>>. There is no tokens_used field. The denominator Anthropic divides by (your plan's effective ceiling for that bucket, at that moment, under the current weighting) is not returned on the wire and is not published. A local counter can tell you 'I sent 1.4M input tokens this session' but cannot convert that into utilization because the denominator is private. The only way to know your server-side utilization is to read the server's own number.
Where exactly does claude-meter read server quota from?
Three endpoints, all under /api/organizations/{org_uuid}/, called with your existing claude.ai session cookie: /usage returns utilization and resets_at per rolling bucket, /overage_spend_limit returns metered dollars used against a monthly cap, and /subscription_details returns next_charge_date and payment method. You can see the exact calls in src/api.rs lines 16-60 of the Rust binary and in extension/background.js lines 24-29 of the browser extension. Both parse the same JSON into the same Rust structs defined in src/models.rs.
Why is utilization sometimes a fraction and sometimes a percent?
The server is inconsistent and the extension handles both shapes. In extension/background.js lines 58-63 the helper pctFromWindow does: const u = w.utilization; return u <= 1 ? u * 100 : u. So a value of 0.97 means 97 percent, and a value of 97 also means 97 percent. This matters if you write your own caller: do not assume one or the other, branch on <= 1. The Rust side stores utilization: f64 and prints '{:>5.1}%' directly, which works because downstream code expects already-scaled percents from the CLI formatter.
What about ccusage, Claude-Code-Usage-Monitor, and similar tools?
They read the local JSONL transcript on disk and sum tokens. That sum is an accurate numerator. It is not utilization. For one thing, not every token on disk was chargeable against every bucket (the per-bucket weightings are invisible to the client). For another, server-side adjustments from before you started logging, from other devices on the same account, or from OAuth app traffic never appear in your local files. A token counter is an answer to 'how much did my session cost locally'. It is not an answer to 'am I about to be rate-limited by claude.ai'.
Does the API docs usage and cost endpoint give me the same number?
No. platform.claude.com's Usage and Cost API is for Console API customers and returns spend broken down by workspace and model for paid API usage. Claude Pro and Max plans ship through claude.ai with different quota semantics (rolling windows, bucketed weights, extra-usage credit on top). The claude.ai/settings/usage page renders from /api/organizations/{uuid}/usage, which is a different, undocumented endpoint on a different host, returning utilization fractions rather than token or dollar counts. claude-meter targets that endpoint specifically because it is what the product itself uses.
The endpoint is undocumented. How stable is it in practice?
Stable enough that the shape has not changed through 2026-04-24, but Anthropic can and occasionally does rename fields on deploys. The mitigation is that ClaudeMeter is open source (MIT) and deserializes into a strongly typed struct. If a field is renamed, serde fails loudly on the next poll and the error bubbles to the menu bar as '!' with the parse message. You would see the break in one git diff of src/models.rs rather than in a silently wrong number.
Do I need to paste a cookie anywhere?
With the browser extension route, no. The extension runs inside Chrome (or Arc, Brave, Edge) and calls the endpoint with credentials: 'include', which reuses your already-logged-in claude.ai session automatically. With the menu-bar-only route, the app reads Chrome Safe Storage via keychain and decrypts the session cookie on your machine. No cookie value ever leaves localhost. Both routes match the byte-for-byte view the settings page renders.
Why does claude-meter poll every 60 seconds?
Because utilization slides. The rolling windows recompute continuously on the server: as old traffic ages out, the denominator effectively shifts, and utilization drifts even without any new messages. Sampling every 60 seconds matches the temporal resolution a human can act on and is below the rate at which the number typically changes in a heavy session. POLL_MINUTES = 1 in extension/background.js line 3.
If the endpoint doesn't return a token count, what does it return?
For each rolling bucket, a Window object with utilization: f64 (the fraction) and resets_at: Option<DateTime<Utc>> (when this bucket's oldest charged traffic ages out of the window). There are seven such buckets in the UsageResponse struct: five_hour, seven_day, seven_day_sonnet, seven_day_opus, seven_day_oauth_apps, seven_day_omelette, seven_day_cowork. No token integers, no message counts, no dollar amounts on this endpoint. The dollar numbers live on the companion /overage_spend_limit endpoint and are for metered billing only.
Can I get utilization from the anthropic-ratelimit-* HTTP headers?
Those headers are on API responses, not claude.ai responses. They give you the most restrictive currently-active API rate limit (tokens per minute, requests per minute, input tokens remaining). They do not expose the rolling 5-hour or 7-day consumer-plan utilization. The consumer plan's utilization is returned only by the private /api/organizations/{uuid}/usage endpoint. Different surface, different contract, not interchangeable.
Can the denominator ever be inferred?
Only indirectly, and only for a fixed workload held constant. If you send a known set of messages across a fresh window and watch utilization climb, you can estimate the tokens-per-percent ratio for that bucket during that hour. That ratio is not constant across buckets, across models, or across weekday peaks. We saw it change after the 2026-03-26 server-side tightening. Any tool pretending to publish 'your remaining tokens' by inverting utilization is guessing.
If claude-meter reads the exact same thing as claude.ai/settings/usage, why install it?
Because the settings page does not stay open and does not alert when you approach a limit. ClaudeMeter runs in the menu bar, refreshes every 60 seconds, and color-codes the badge (green under 80 percent, amber 80 to 100, red at 100). The underlying data is identical; the ergonomics are different. If you want the raw number in a terminal, the same binary ships a CLI: /Applications/ClaudeMeter.app/Contents/MacOS/claude-meter --json.
Keep reading
The 5-hour wall is server-side, not client-side
Why a local counter cannot predict when the 5-hour bucket trips, and what to watch instead.
Your plan has seven reset clocks, not one
Every Window field in /usage ships its own resets_at. The one at 100 percent is your real countdown.
Burn rate against a rolling window, not a calendar window
How utilization drifts minute to minute and why a sample from 30 minutes ago is usually wrong.