Prompt Caching
Leverage provider-side prompt caching for significant cost and latency savings on large, repeated prompts.
To save on inference costs, you can leverage prompt caching on supported providers and models. When a provider supports it, LangDB will make a best-effort to route subsequent requests to the same provider to make use of the warm cache.
Most providers automatically enable prompt caching for large prompts, but some, like Anthropic, require you to enable it on a per-message basis.
How Caching Works
Automatic Caching
Providers like OpenAI, Grok, DeepSeek, and (soon) Google Gemini enable caching by default once your prompt exceeds a certain length (e.g. 1024 tokens).
Activation: No change needed. Any prompt over the length threshold is written to cache.
Best Practice: Put your static content (system prompts, RAG context, long instructions) first in the message so it can be reused.
Pricing:
Cache Write: Mostly free or heavily discounted.
Cache Read: Deep discounts vs. fresh inference.

Manual Caching:
Anthropic’s Claude family requires you to mark which parts of the message are cacheable by adding a cache_control
object. You can also set a TTL to control how long the block stays in cache.
Activation: You must wrap static blocks in a
content
array and give them acache_control
entry.TTL: Use
{"ttl": "5m"}
or{"ttl": "1h"}
to control expiration (default 5 minutes).Best For: Huge documents, long backstories, or repeated system instructions.
Pricing:
Cache Write: 1.25× the normal per-token rate
Cache Read: 0.1× (10%) of the normal per-token rate
Limitations: Ephemeral (expires after TTL), limited number of blocks.

Caching Example ( Anthropic)
Here is an example of caching a large document. This can be done in either the system
or user
message.
{
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant that analyzes legal documents. The following is a terms of service document:"
},
{
"type": "text",
"text": "HUGE DOCUMENT TEXT...",
"cache_control": {
"type": "ephemeral",
"ttl": "1h"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Summarize the key points about data privacy."
}
]
}
]
}
Provider Support Matrix
OpenAI
✅
❌
N/A
standard
0.25x or 0.5x
Grok
✅
❌
N/A
standard
0.25x
DeepSeek
✅
❌
N/A
standard
0.25x
Anthropic Claude
❌
cache_control
+ TTL
5 m / 1 h
1.25×
0.1×
For the most up-to-date information on a specific model or provider's caching policy, pricing, and limitations, please refer to the model page on LangDB
Last updated
Was this helpful?