AI coding cost glossary
Plain definitions of the tokens, caching, and rate-limit terms that drive what you pay.
- Prompt cachingReusing a previously-sent context prefix instead of paying full input…
- Cache read vs cache writeThe difference between paying to store context (write) and paying to …
- Ephemeral 5-minute vs 1-hour cacheClaude's two cache-write durations, billed at different rates — the s…
- Context windowThe maximum tokens a model can consider at once, including your whole…
- Context cliffThe point where a request gets billed at a higher per-token tier for …
- TokenizerHow text is split into tokens — the unit you're billed on.
- Notional vs real costWhether a dollar figure is what you actually spent or what usage woul…
- List priceA provider's published per-token API rate, before any subscription or…
- Used percentHow much of a rate-limit window you've consumed, as reported by the p…
- Burn rateHow fast you're spending tokens or dollars over a window.
- Reasoning tokensHidden 'thinking' tokens a model generates before its visible answer.
- 5-hour windowA rolling usage window that starts on your first message and caps how…
- Weekly windowA 7-day usage cap that resets at a fixed time regardless of the 5-hou…
- Blended $/MTokA single effective per-million-token price across a mix of input, out…