Introducing Virtual MCP Servers
LogoLogo
GithubJoin SlackSignupBook a Demo
  • Documentation
  • Self Hosted
  • Integrations
  • Guides
  • Enterprise
  • Introduction to AI Gateway
  • Supported Models
  • Supported MCP Servers
  • Getting Started
    • Quick Start
    • Working with API
    • Working with Multiple Agents
    • Working with MCPs
    • Working with Headers
    • User Tracking
    • Using Parameters
  • Concepts
    • Thread
    • Trace
    • Run
    • Label
    • Message
    • Virtual Models
      • Routing with Virtual Model
    • Virtual MCP Servers
  • Features
    • Tracing
    • Routing
    • MCP Support
    • Publishing MCP Servers
    • Usage
    • Analytics
    • Guardrails
    • User Roles
    • Cost Control
    • Response Caching
  • Python SDK
    • Getting Started
  • API Reference
  • Postman Collection
Powered by GitBook
LogoLogo

Social

  • LinkedIn
  • X
  • Youtube
  • Github

Platform

  • Pricing
  • Documentation
  • Blog

Company

  • Home
  • About

Legal

  • Privacy Policy
  • Terms of Service

2025 LangDB. All rights reserved.

On this page
  • Benefits
  • Using Response Caching
  • Through Virtual Model
  • Through API Calls

Was this helpful?

Export as PDF
  1. Features

Response Caching

Enable response caching in LangDB for faster, lower-cost results on repeated LLM queries.

Response caching is designed for faster response times, reduced compute cost, and consistent outputs when handling repeated or identical prompts. Perfect for dashboards, agents, and endpoints with predictable queries.

Benefits

  • Faster responses for identical requests (cache hit)

  • Reduced model/token usage for repeated inputs

  • Consistent outputs for the same input and parameters

Using Response Caching

Through Virtual Model

  1. Toggle Response Caching ON.

  2. Select the cache type:

    • Exact match (default): Matches prompt.

    • (Distance-based matching is coming soon.)

  3. Set Cache expiration time in seconds (default: 1200).

Once enabled, identical requests will reuse the cached output as long as it hasn’t expired.

Through API Calls

You can use caching on a per-request basis by including a cache field in your API body:

{
  "model": "openai/gpt-4.1",
  "messages": [
    {"role": "user", "content": "Summarize the news today"}
  ],
  "cache": {
    "type": "exact",
    "expiration_time": 1200
  }
}
  • type: Currently only exact is supported.

  • expiration_time: Time in seconds (e.g., 1200 for 20 minutes).

If caching is enabled in both the virtual model and the request, the API payload takes priority.

PreviousCost ControlNextGetting Started

Last updated 2 days ago

Was this helpful?