AI coding cost glossary

Plain definitions of the tokens, caching, and rate-limit terms that drive what you pay.

Prompt cachingReusing a previously-sent context prefix instead of paying full input…
Cache read vs cache writeThe difference between paying to store context (write) and paying to …
Ephemeral 5-minute vs 1-hour cacheClaude's two cache-write durations, billed at different rates — the s…
Context windowThe maximum tokens a model can consider at once, including your whole…
Context cliffThe point where a request gets billed at a higher per-token tier for …
TokenizerHow text is split into tokens — the unit you're billed on.
Notional vs real costWhether a dollar figure is what you actually spent or what usage woul…
List priceA provider's published per-token API rate, before any subscription or…
Used percentHow much of a rate-limit window you've consumed, as reported by the p…
Burn rateHow fast you're spending tokens or dollars over a window.
Reasoning tokensHidden 'thinking' tokens a model generates before its visible answer.
5-hour windowA rolling usage window that starts on your first message and caps how…
Weekly windowA 7-day usage cap that resets at a fixed time regardless of the 5-hou…
Blended $/MTokA single effective per-million-token price across a mix of input, out…