Response Caching
Enable response caching in LangDB for faster, lower-cost results on repeated LLM queries.
Response caching is designed for faster response times, reduced compute cost, and consistent outputs when handling repeated or identical prompts. Perfect for dashboards, agents, and endpoints with predictable queries.
Benefits
Faster responses for identical requests (cache hit)
Reduced model/token usage for repeated inputs
Consistent outputs for the same input and parameters
Using Response Caching
Through Virtual Model
Toggle Response Caching ON.
Select the cache type:
Exact match (default): Matches prompt.
(Distance-based matching is coming soon.)
Set Cache expiration time in seconds (default:
1200
).
Once enabled, identical requests will reuse the cached output as long as it hasn’t expired.
Through API Calls
You can use caching on a per-request basis by including a cache
field in your API body:
type
: Currently onlyexact
is supported.expiration_time
: Time in seconds (e.g., 1200 for 20 minutes).
If caching is enabled in both the virtual model and the request, the API payload takes priority.
Last updated
Was this helpful?