Skip to main content

Overview

How Provider Prompt Cache Works

Provider prompt caching stores the processed representation of your prompt prefix on the provider’s servers. When you send a request with the same prefix, the provider can skip reprocessing those tokens.

Key Characteristics

  • Prefix-based: Only the beginning of your prompt can be cached
  • Exact match: Requires identical tokens (not semantic similarity)
  • Time-limited: Cache entries expire (typically 5-60 minutes)
  • Automatic: No special configuration needed
Request 1: [System prompt + Context A + Question 1]
           ^^^^^^^^^^^^^^^^^^^^^^^^
           This prefix gets cached

Request 2: [System prompt + Context A + Question 2]
           ^^^^^^^^^^^^^^^^^^^^^^^^
           Cache hit! Only Question 2 is processed

Supported Providers

ProviderCache Read DiscountCache Write CostMin Tokens
Anthropic90% off25% premium1024
OpenAI50% offSame as input1024
DeepSeek90% offSame as input64
Google75% off25% premium32768
Discounts are applied automatically. AI Sonar passes through the provider’s cache pricing to you.

Identifying Cache Usage

In Usage Logs

Your usage logs show detailed cache token breakdown:
FieldDescription
cacheReadTokensTokens served from provider cache (discounted)
cacheWriteTokensTokens written to cache (for future requests)
nonCachedPromptTokensTokens processed without cache

In Transactions

Transactions show a Provider Cache label when upstream caching was used:
  • Provider Cache (teal): Upstream prompt cache hit - discounted rates

Cost Calculation Example

For a request with 10,000 input tokens to Claude (Anthropic): Without cache:
10,000 tokens × $3.00/1M = $0.030
With provider cache (8,000 cached + 2,000 new):
Cache read:  8,000 tokens × $0.30/1M = $0.0024  (90% off)
Cache write: 2,000 tokens × $3.75/1M = $0.0075  (25% premium)
Total: $0.0099 (67% savings)

Best Practices

Place your system prompt and static context at the beginning of your messages. This maximizes cache hit potential.
Send requests with the same prefix close together in time to benefit from cache before it expires.
Ensure your cacheable prefix meets the provider’s minimum (e.g., 1024 tokens for Anthropic/OpenAI).
Check your dashboard usage statistics for cache hit rates and savings.

Checking Cache Status

Response Headers

X-Upstream-Cache-Read: 8000   # Provider cache read tokens
X-Upstream-Cache-Write: 2000  # Provider cache write tokens

Usage API

Usage logs API is not public yet. Check cache breakdown via response headers and the dashboard:
GET /v1/usage/logs is currently not a public endpoint.
Use X-Upstream-Cache-* response headers, plus the dashboard usage page.
Response includes:
{
  "promptTokens": 10000,
  "cacheReadTokens": 8000,
  "cacheWriteTokens": 2000,
  "nonCachedPromptTokens": 0,
  "completionTokens": 500,
  "cost": 0.0099
}

FAQ

Provider caching is automatic and cannot be disabled. However, it only benefits you (lower costs), so there’s no reason to disable it.
Common reasons:
  • Prefix changed (even one token difference)
  • Cache expired (typically 5-60 minutes)
  • Prefix too short (below minimum tokens)
  • Different API key used
Yes! When using your own API keys (BYOK), provider caching works the same way. The cache is tied to your upstream API key.
  1. Structure prompts with static content first
  2. Keep system prompts consistent across requests
  3. Send related requests in quick succession