# LangDB Documentation
> Unified AI Gateway Documentation
This file contains all documentation content in a single document following the llmstxt.org standard.
## Documentation
Quick Start
A full featured and managed AI gateway that provides instant access to 250+ LLMs with enterprise ready features.
Self Hosted
A self-hosted option for organizations that require complete control over their AI infrastructure.
## Getting Started
Govern, Secure, and Optimize your AI Traffic. LangDB AI Gateway provides unified interface to all LLMs using OpenAI API format. Built with performance and reliability in mind.
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.us-east-1.langdb.ai" # LangDB API base URL,
api_key=api_key, # Replace with your LangDB token
)
# Make the API call to LangDB's Completions API
response = client.chat.completions.create(
model="gpt-4o", # Use the model
messages=[{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}],
extra_headers={"x-project-id": "xxxxx"} # LangDB Project ID
)
```
Discover LangDB features and capabilities.
What is AI Gateway ?
Introduction to LangDB's AI Gateway and its capabilities.
Working with API
Guide to integrating LangDB using its API.
Tracing Multiple Agents
Learn how to track and analyze multiple AI agents in LangDB.
Python SDK
Explore the Python SDK for seamless integration with LangDB.
## Features
Tracing
Monitor and debug LLM workflows.
Auto Router
Automatically picks the best model for your query.
Routing
Optimize LLM selection dynamically based on cost, speed, and availability.
MCP Support
Enable compatibility and integration with MCP systems.
Usage
Track costs effectively.
Analytics
Gain insights of your API calls.
Guardrails
Ensure safe, reliable, and compliant AI interactions.
User Roles
Manage users in your organisation.
---
## AI Gateway API
import ApiLogo from "@theme/ApiLogo";
import Heading from "@theme/Heading";
import SchemaTabs from "@theme/SchemaTabs";
import TabItem from "@theme/TabItem";
Complete API reference for LangDB unified AI gateway services.
## Introduction
This API is documented in **OpenAPI format** and provides access to LangDB's unified AI gateway services.
LangDB offers a comprehensive platform for managing and interacting with 250+ LLM models through a single unified interface. The API supports:
- **Multiple Model Providers**: Access models from OpenAI, Anthropic, Google, Meta, and many more
- **Unified Interface**: Use OpenAI-compatible API format for all models
- **Enterprise Features**: Advanced routing, analytics, rate limiting, and more
- **Thread Management**: Track and manage conversation threads across models
- **Usage Analytics**: Monitor your API usage and costs in real-time
All API endpoints use **Bearer Authentication** with JWT tokens. Include your API key in the `Authorization` header:
```
Authorization: Bearer
```
Additionally, include your **Project ID** in the `x-project-id` header for all requests:
```
x-project-id:
```
Analytics API
Analytics and usage tracking endpoints for monitoring your LangDB usage.
View OpenAPI Specification
Completions API
Chat completions and text generation endpoints for interacting with LLM models.
View OpenAPI Specification
Enterprise API
Enterprise features including model management and admin operations.
View OpenAPI Specification
Misc API
Miscellaneous endpoints including embeddings and other utility functions.
View OpenAPI Specification
Threads API
Thread management endpoints for managing conversation threads.
View OpenAPI Specification
---
## Create a new model
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Register and configure a new LLM under your LangDB project
---
## Create chat completion
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Create chat completion
---
## Fetch analytics data
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Fetch analytics data
---
## Fetch analytics summary
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Fetch analytics summary
---
## Create embeddings
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Creates an embedding vector representing the input text or token arrays.
---
## Retrieve pricing information
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Returns the pricing details for LangDB services.
---
## Retrieve the total cost for a specific thread
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Retrieve the total cost for a specific thread
---
## Get total usage
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Get total usage
---
## Get usage by model
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Get usage by model
---
## List models
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
List models
---
## Retrieve messages for a specific thread
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Retrieve messages for a specific thread
---
## Retrieve a list of threads
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Retrieve a list of threads
---
## Set custom prices for imported models
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Set custom pricing for models imported from providers like Bedrock, Azure, Vertex that do not have built-in pricing
---
## Set custom prices for imported models(Api-reference)
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Set custom pricing for models imported from providers like Bedrock, Azure, Vertex that do not have built-in pricing
---
## Label
Label in LangDB defines an LLM instance with a unique identifier for categorization and tracking.

### Core Features
* **Model Categorization**: Assign labels to LLM instances.
* **Observability**: Track models by label.
### Headers for Label:
* **x-label**: Defines a label for an LLM instance.
```json
{
"x-label" : "research-agent"
}
```
---
## Message
A **Message** is the **basic unit of communication** in LangDB workflows. Messages define the interaction between the user, the system, and the model. Every workflow is built around exchanging and processing messages.
**Core Features:**
* **Structured Interactions**: Messages define roles (`user`, `system`, `assistant`) to organize interactions clearly.
* **Multi-Role Flexibility**: Different roles (e.g., `system` for instructions, `user` for queries) enable complex workflows.
* **Dynamic Responses**: Messages form the backbone of LangDB’s chat-based interactions.
**Example:**
A simple interaction to generate a poem might look like this:
```json
[
{ "role": "system", "content": "You are a helful assistant" },
{ "role": "user", "content": "Write me a poem about celluloids." }
]
```
---
## Run
A **Run** represents a **single workflow or operation** executed within LangDB. This could be a model invocation, a tool call, or any other discrete task. Each Run is independent and can be tracked separately, making it easier to analyze and debug individual workflows.
Example of a Run:
**Core Features:**
* **Granular Tracking**: Analyze and optimize the performance and cost of individual Runs.
* **Independent Execution**: Each Run has a distinct lifecycle, enabling precise observability.
**Example:**
Generating a summary of a document, analyzing a dataset, or fetching information from an external API – each is a **Run**.
**Headers for Run:**
* `x-run-id`: Identifies a specific Run for tracking and debugging purposes.
---
## Thread
A **Thread** is simply a grouping of **Message History** that maintains context in a conversation or workflow. Threads are useful for keeping track of past messages and ensuring continuity across multiple exchanges.
**Core Features:**
* **Contextual Continuity**: Ensures all related Runs are grouped for better observability.
* **Multi-Turn Support**: Simplifies managing interactions that require maintaining state across multiple Runs.
**Example:**
A user interacting with a chatbot over multiple turns (e.g., asking follow-up questions) generates several messages, but all are grouped under a single **Thread** to maintain continuity.
**Headers for Thread:**
* `x-thread-id`: Links all Runs in the same context or conversations.
* `x-thread-title`: Assigns a custom, human-readable title to the thread, making it easier to identify.
* `x-thread-public`: Makes the thread publicly accessible via a shareable link by setting its value to `1` or `true`.
---
## Trace
A **Trace** represents the **complete lifecycle of a workflow**, spanning all components and systems involved.
**Core Features:**
* **End-to-End Visibility**: Tracks model calls, tools across the entire workflow.
* **Multi Agent Ready**: Perfect for workflows that involve multiple services, APIs, or tools.
* **Error Diagnosis**: Quickly identify bottlenecks, failures, or inefficiencies in complex workflows.
**Parent-Trace:**
For workflows with nested operations (e.g., a workflow that triggers multiple sub-workflows), LangDB introduces the concept of a **Parent-Trace**, which links the parent workflow to its dependent sub-workflows. This hierarchical structure ensures you can analyze workflows at both macro and micro levels.
**Headers for Trace:**
* `trace-id`: Tracks the parent workflow.
* `parent-trace-id`: Links sub-workflows to the main workflow for hierarchical tracing.
---
## Virtual MCP Servers
A **Virtual MCP Server** lets you create a customized set of MCP tools by combining functions from multiple MCP servers — all with scoped access, unified auth, and full observability.
**Video:** https://youtu.be/cpoKSdHpsms
Quick Guide on how to use virtual MCP Server
## Why Use a Virtual MCP?
* **Selective Tools**: Pick only the tools you need from existing MCP servers (e.g. Airtable's `list_records`, GitHub's `create_issue`, etc.)
* **Clean Auth Handling**: Add your API keys o\`nly if needed. Otherwise, LangDB handles OAuth for you.
* **Full Tracing**: Every call is traced on the LangDB — with logs, latencies, input/output, and error metrics.
* **Easy Integration**: Works out of the box with Cursor, Claude, Windsurf, and more.
* **Version Lock-in**: Virtual MCPs are pinned to a specific server version to avoid breaking changes.
* **Poisoning Safety**: Prevents injection or override by malicious tool definitions from source MCPs.
## How to Set It Up
1. Go to your Virtual MCP server on LangDB Project.
2. Select the tools you want to include.
3. (Optional) Add API keys or use LangDB-managed auth.
4. Click `Generate secure MCP URL`.
### Install in Cursor / Windsurf / Claude
Once you have the MCP URL:
```bash
npx @langdb/mcp setup figma https://api.staging.langdb.ai/mcp/xxxxx --client cursor
```
You're now ready to use your selected tools directly inside the editor.
### Try it in the playground
You can also try the Virtual MCP servers by adding the server in the [virtual model](../virtual-models/)config.
---
## Custom MCP Servers
While LangDB provides a rich library of pre-built MCP servers, you can also bring your own. By connecting a custom MCP server, you can leverage all the benefits of a [Virtual MCP Server](./), including:
* **Unified Interface**: Combine your custom tools with tools from other LangDB-managed servers.
* **Clean Auth Handling**: Let LangDB manage authentication, or provide your own API keys and headers.
* **Full Observability**: Get complete tracing for every call, with logs, latencies, and metrics.
* **Seamless Integration**: Works out-of-the-box with clients like Cursor, Claude, and Windsurf.
* **Enhanced Security**: Benefit from version pinning and protection against tool definition poisoning.
This guide explains how to connect your own custom MCP server, whether it uses an HTTP (REST API) or SSE (Server-Sent Events) transport.
## Connecting Your Custom Server
When creating a [Virtual MCP Server](./), you can add your own server alongside the pre-built integrations.
### Steps to Configure a Custom Server
1. **Navigate to Virtual MCP Servers**: Go to the "Virtual MCP Servers" section in your LangDB project and click "Create Virtual MCP Server".
2. **Add a Custom Server**: In the "Server Configuration" section, click the "+ Add Server" button on the right and select "Custom" from the list.
3. **Configure Server Details**: A new "Custom Server" block will appear on the left. Fill in the following details:

* **Server Name**: Give your custom server a descriptive name.
* **Transport Type**: Choose either `HTTP (REST API)` or `SSE (Server-Sent Events)` from the dropdown.
* **HTTP/SSE URL**: Enter the endpoint URL for your custom MCP server. LangDB will attempt to connect to this URL to validate the server and fetch the available tools.
* **(Optional) HTTP Headers**: If your server requires specific HTTP headers for authentication or other purposes, you can add them here.
* **(Optional) Environment Variables**: If your server requires specific configuration via environment variables, you can add them.
![Server Configuration]
4. **Select Tools**: Once LangDB successfully connects to your server, it will display a list of all the tools exposed by your MCP server. You can select which tools you want to include in your Virtual MCP Server.
5. **Generate URL**: After configuring your custom server and selecting the tools, you can generate the secure URL for your Virtual MCP Server and start using it in your applications.
---
## Virtual Models
LangDB’s **Virtual Models** let you save, share, and reuse model configurations—combining prompts, parameters, tools, and routing logic into a single named unit. This simplifies workflows and ensures consistent behavior across your apps, agents, and API calls.
Once saved, these configurations can be quickly accessed and reused across multiple applications.
## Why do you need Virtual Models
Virtual models in LangDB are more than just model aliases. They are **fully configurable AI agents** that:
* Let you define **system/user messages** upfront
* Support **routing logic** to dynamically choose between models
* Include **MCP integrations** and **guardrails**
* Are callable from **UI playground**, **API**, and **LangChain/OpenAI SDKs**
Use virtual models to manage:
* Prompt versioning and reuse
* Consistent testing across different models
* Precision tuning with per-model parameters
* Seamless integration of tools and control logic
* Routing using strategies like fallback, percentage-based, latency-based, optimized, and script-based selection
## Setting Up Virtual Model
1. Go to the **Models**
2. Click on **Create Virtual Model.**
3. **Set prompt messages** — define system and user messages to guide model behavior
4. **Set variables** (optional) — useful if your prompts require dynamic values
5. **Select router type**
* `None`: Use a single model only
* `Fallback`, `Random`, `Cost`,`Percentage`, `Latency`, `Optimized`: Configure smart routing across targets. Checkout all [Routing Strategies](../../features/routing/).
6. **Add one or more targets**
* Each target defines a model, mcp servers, guardrails, system-user messages, response format and its parameters (e.g. temperature, max\_tokens, top\_p, penalties)
7. **Select MCP Servers** — connect tools like LangDB Search, Code Execution, or others
8. **Add guardrails** (optional) — for validation, transformation, or filtering logic
9. **Set response format** — choose between `text`, `json_object`, or `json_schema`
10. Give your virtual model a name and Save.
Your virtual model now appears in the **Models** section of your project, ready to be used anywhere a model is accepted.
## Updating and Versioning
You can edit virtual models anytime. LangDB supports formal versioning via the `@version` syntax:
* `langdb/my-model@latest` or `langdb/my-model` → resolves to the latest version
* `langdb/my-model@v1` or `langdb/my-model@1` → resolves to version 1
This allows you to safely test new versions, roll back to older ones, or maintain multiple stable variants of a model in parallel.
## Using Your Virtual Model
Once saved, your virtual model is fully available across all LangDB interfaces:
* **Chat Playground**: Select it from the model dropdown and test interactively.
* **OpenAI-Compatible SDKs**: Works seamlessly with OpenAI clients by changing only the `model` name.
* **LangChain / CrewAI / other frameworks**: Call it just like any base model by using `model="langdb/my-model@latest"` or a specific version like `@v1`.
This makes virtual models a portable, modular building block across all parts of your AI stack.
---
## Draft Mode
LangDB’s Virtual Models support a **draft mode** that streamlines version management and ensures safe, iterative changes. In draft mode, modifications are isolated from the published version until you explicitly publish, giving you confidence that live traffic is unaffected by in-progress edits.
## Version Workflow
1. **Edit in Draft**
* Making any change (e.g., adjusting parameters, adding guardrails, modifying messages) flips the version into a **Modified** draft.
2. **Save Draft**
* Click **Save** to record your changes. The draft is saved as a new version at the top of the version list, without affecting the live version.
* Live API traffic remains pointed at the last published version.
3. **Publish Draft**
* Once validated, click **Publish**:
* Saves the version as the new latest version.
* Directs all live chat completion traffic to this version.
* Keeps the previous published version visible in the list so you can reselect and republish if needed.
4. **Restore & Edit Previous Version**
* Open the version dropdown and select any listed version.
* The selected version loads into the editor.
* You can further modify this draft and click **Save** to create a new version entry.
5. **Re-Publish Any Version**
* To make any saved version live, select it from the dropdown and click **Publish**.
## API Behavior
All `chatCompletions` requests to a Virtual Model endpoint automatically target the **latest published version**. Drafts and restored drafts never receive live traffic until published.
```python
from openai import OpenAI
client = OpenAI(
base_url="https://api.us-east-1.langdb.ai",
api_key=api_key,
)
# Always hits current published version
response = client.chat.completions.create(
model="openai/langdb/my-virtual-model@latest",
messages=[...],
)
```
To preview changes in a draft or restored draft, switch the UI or JSON view selector to that draft and experiment in the Virtual Model Editor — all without impacting production calls.
## Best Practices
* **Iterate Safely**: Leverage drafts for experimental guardrails or parameter tuning without risking production stability.
* **Frequent Publishing**: Keep version history granular—publish stable drafts regularly to simplify tracking and rollbacks.
* **Use Restore Thoughtfully**: Before restoring, ensure any important unsaved draft work is committed or intentionally discarded.
---
## Routing with Virtual Model
In LangDB AI Gatewau, any virtual model can act as a router. Just define a strategy and list of target models—it’ll route requests based on metrics like cost, latency, percentage, er or custom rules.
## Setting up Routing
Setting up routing in a virtual model is straightforward:
1. Open any virtual model in the Chat Playground and click **Show Config**
2. Choose a **routing strategy** (like `fallback`, `optimized`, `percentage`, etc.)
3. Add your **target models**—each one can be configured just like the virtual models you set up in the previous section
Each target defines:
* Which model to use
* Prompt
* MCP Servers
* Guardrails
* Response Format
* Custom parameters like temperature, max\_tokens, penalties, etc.
All routing options are available directly in the virtual model config panel.
Check more about the [Routing Strategies](../../features/routing/).
---
## Setting Up Provider Keys
LangDB supports bringing your own provider API keys (BYOK) to give you direct control over rate limits and costs via your provider account.
## Overview
When you use provider keys, your API keys are securely encrypted and used for all requests routed through the specified provider. This enables:
* Direct control over rate limits and costs
* Access to your provider's specific features and models
* Seamless integration with existing provider accounts
## Provider Configurations
### OpenAI Provider
The OpenAI provider can be configured to work with both OpenAI's direct API and Azure OpenAI services through the same configuration interface.
#### Configuration Fields
The OpenAI provider configuration modal includes:
* **API Key**: Your OpenAI API key (required field)
* **Endpoint**: Your custom endpoint URL (optional - leave blank for standard OpenAI, or enter your Azure OpenAI endpoint)
* **Description**: Optional notes about this configuration
#### Standard OpenAI Configuration
For direct OpenAI API access, simply enter your OpenAI API key in the API Key field and leave the Endpoint field blank.
#### Azure OpenAI Configuration
To use Azure OpenAI, enter your OpenAI API key in the API Key field and your Azure OpenAI endpoint URL in the Endpoint field. This allows you to use OpenAI models to call Azure OpenAI directly through the same provider configuration.
### Google AI Studio
Google AI Studio (formerly known as Google AI Platform) requires your Google API key for authentication. Enter your API key in the configuration form.
### AWS Bedrock
AWS Bedrock supports two authentication methods through the UI:
#### Option 1: Bedrock API Keys (Recommended)
Amazon Bedrock API keys provide a simpler authentication method. Simply enter your Bedrock API key in the designated field.
**Note:** Bedrock API keys are tied to a specific AWS region and cannot be used to change regions.
#### Option 2: AWS Credentials
Alternatively, you can use traditional AWS credentials for more flexibility. The UI provides separate input fields for:
* **Access Key**: Your AWS Access Key ID
* **Access Secret**: Your AWS Secret Access Key
* **AWS Region**: A dropdown to select your preferred AWS region
* **Description**: An optional field to add notes about this configuration
**Important:** AWS will automatically fetch available models when using either authentication method, so you don't need to manually specify model configurations.
## Custom Pricing for Imported Models
Some models imported from providers like AWS Bedrock, Azure, and Vertex AI don't have built-in pricing information. For these models, you can set custom pricing using the LangDB API.
### Setting Custom Prices
Use the following API endpoint to configure custom pricing for imported models:
```bash
curl -X POST "https://api.us-east-1.langdb.ai/projects/{project_id}/custom_prices" \
-H "Authorization: Bearer YOUR_SECRET_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bedrock/twelvelabs.pegasus-1-2-v1:0": {
"per_input_token": 1.23,
"per_output_token": 2.12
}
}'
```
This allows you to track costs accurately for models that don't have predefined pricing in the LangDB system.
You can checkout: [Set custom prices for imported models](../api-reference/set-custom-prices-for-imported-models)
## Setup Instructions
## Step 1: **Go to Project Settings**
## Step 2: **Go to Providers Tab in Project Settings**
## Step 3: **Choose the Provider to be Configured**
In this example, we will use Azure OpenAI and set it up
Once the Provider has been configured, you can use Azure Models.
---
## Setting Up SSO with Google Workspace
This guide will help you configure Single Sign-On (SSO) for your organization using Google Workspace. When enabled, users with your organization's email domain can sign in seamlessly using their Google accounts and will automatically be added to your organization on LangDB.
## Step 1: Navigate to User Management
From your LangDB dashboard, click on "Manage Users" in the left sidebar navigation menu.
Navigate to Manage Users
## Step 2: Access SSO Configuration
On the User Management page, locate the "Enable SSO" checkbox in the top right corner, positioned below the red "& Invite" button. This setting controls whether SSO is enabled for your organization.
User Management Page with SSO Option
## Step 3: Configure SSO Settings
Enable the SSO Authentican here. Once saved, all users with your organization's email domain will be able to sign in using Google SSO and will automatically be added to your organization.
Save SSO Configuration
## How It Works
Once SSO is configured:
1. **Automatic User Addition**: When someone with your organization's email domain (e.g., @langdb.ai) signs in using Google, they will automatically be added to your LangDB organization.
2. **Seamless Authentication**: Users can sign in using their Google Workspace credentials without needing separate LangDB accounts.
3. **Domain Verification**: The system automatically detects and uses your organization's domain for user identification.
### Important Notes
* **Domain Detection**: Your organization domain is automatically detected from your email address and cannot be changed.
* **User Management**: All users with your domain will be able to sign in automatically once SSO is enabled.
* **Security**: SSO provides a secure and streamlined authentication experience for your organization.
---
## Starting Trial
## Step 1: Visit Billings Page
You can visit billing page by either clicking on your tier or visit [https://app.langdb.ai/settings/billing](https://app.langdb.ai/settings/billing)
## Step 2: Choose Professional Plan
## Step 3: Enter Details and Start the trial
---
## Analytics
You can monitor API usage with key insights.
After integrating LangDB into your project, the Analytics Dashboard becomes your central hub for understanding usage.
## **Metrics**
LangDB’s Analytics Dashboard is segmented into several key panels:
### **Cost:**
* Tracks your total cost consumption across all integrated models.
* Enables you to compare costs by provider/model/tags, helping you identify the most cost-effective options for your use cases.
### **Time:**
* Displays the average duration of requests in milliseconds.
* Useful for benchmarking response times and optimizing performance for latency-sensitive applications.
### **Number of Requests:**
* Shows the total number of API calls made.
* Helps you analyze usage patterns and allocate resources effectively.
### **Average Time to First Token (TTFT)**
* Indicates the average time taken to receive the first token from the API response.
* This metric is critical for understanding initial latency.
### **Tokens Per Second (TPS)**
* Measures the throughput of token generation.
* High TPS is indicative of efficient processing.
### **Time Per Output Token (TPOT)**
* Tracks the average time spent per output token.
* Helps in identifying and troubleshooting bottlenecks in model output.
### Error Rate
* Displays the **percentage of failed requests** over total requests.
* Helps monitor **system stability and reliability**.
### Error Request Count
* Tracks the **total number of failed API requests**.
* Useful for debugging and **troubleshooting failures** effectively.
## Analytics APIs
### [Fetch analytics data](../api-reference/fetch-analytics-data)
Provides a detailed timeseries view of API usage metrics. Users can filter data by time range and group it by provider, model, or tags to analyze trends over different periods.
```bash
# grouby: provider/tag/model
curl --location 'https://api.us-east-1.langdb.ai/analytics' \
--header 'x-project-id: langDBProjectID' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer langDBAPIKey' \
--data '{"start_time_us": , "end_time_us": , "groupBy": ["provider"]}'
```
Example response:
```json
{
"timeseries": [
{
"hour": "2025-01-23 04:00:00",
"total_cost": 0.0006719999999999999,
"total_requests": 2,
"avg_duration": 814.4,
"duration": 814.4,
"duration_p99": 1125.4,
"duration_p95": 1100.0,
"duration_p90": 1068.3,
"duration_p50": 814.4,
"total_duration": 1628.778,
"total_input_tokens": 72,
"total_output_tokens": 38,
"error_rate": 0.0,
"error_request_count": 0,
"avg_ttft": 814.4,
"ttft": 814.4,
"ttft_p99": 1125.4,
"ttft_p95": 1100.0,
"ttft_p90": 1068.3,
"ttft_p50": 814.4,
"tps": 67.54,
"tps_p99": 110.03,
"tps_p95": 107.55,
"tps_p90": 104.45,
"tps_p50": 79.63,
"tpot": 0.04,
"tpot_p99": 0.06,
"tpot_p95": 0.06,
"tpot_p90": 0.06,
"tpot_p50": 0.04,
"tag_tuple": [
"openai"
]
}
]
}
```
### [Fetch analytics summary](../api-reference/fetch-analytics-summary)
Provides aggregated usage metrics, allowing users to get a high-level overview of API consumption and error rates.
```bash
# groupby: provider/tag/model
curl --location 'https://api.us-east-1.langdb.ai/analytics/summary' \
--header 'x-project-id: langDBProjectID' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer langDBAPIKey' \
--data '{"start_time_us": , "end_time_us": , "groupBy": ["provider"]} '
```
Example response:
```json
{
"summary": {
"tag_tuple": [
"togetherai"
],
"total_cost": 0.0015163199999999998,
"total_requests": 8,
"total_duration": 5242.402,
"avg_duration": 655.3,
"duration": 655.3,
"duration_p99": 969.2,
"duration_p95": 962.5,
"duration_p90": 954.1,
"duration_p50": 624.3,
"total_input_tokens": 853,
"total_output_tokens": 200,
"avg_ttft": 655.3,
"ttft": 655.3,
"ttft_p99": 969.2,
"ttft_p95": 962.5,
"ttft_p90": 954.1,
"ttft_p50": 624.3,
"tps": 200.86,
"tps_p99": 336.04,
"tps_p95": 304.95,
"tps_p90": 266.08,
"tps_p50": 186.24,
"tpot": 0.03,
"tpot_p99": 0.04,
"tpot_p95": 0.04,
"tpot_p90": 0.04,
"tpot_p50": 0.03,
"error_rate": 0.0,
"error_request_count": 0
},
}
```
### Filtering By Users
As discussed in User Tracking, we can use filters to retrieve insights based on `id`, `name`, or `tags`.
**Available Filters:**
* **`user_id`**: Filter data for a specific user by their unique ID.
* **`user_name`**: Retrieve usage based on the user’s name.
* **`user_tags`**: Filter by tags associated with a user (e.g., "websearch", "support").
```bash
curl -L \
--request POST \
--url 'https://api.us-east-1.langdb.ai/analytics/summary' \
--header 'Authorization: Bearer langDBAPIKey' \
--header 'X-Project-Id: langDBProjectID' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "123",
"user_name": "mrunmay",
"user_tags": ["websearch", "testings"]
}'
```
Example response:
```json
{
"summary": [
{
"total_cost": 0.00112698,
"total_requests": 4,
"total_duration": 31645.018,
"avg_duration": 7911.3,
"duration": 7911.3,
"duration_p99": 9819.3,
"duration_p95": 9809.0,
"duration_p90": 9796.1,
"duration_p50": 8193.2,
"total_input_tokens": 4429,
"total_output_tokens": 458,
"avg_ttft": 7911.3,
"ttft": 7911.3,
"ttft_p99": 9819.3,
"ttft_p95": 9809.0,
"ttft_p90": 9796.1,
"ttft_p50": 8193.2,
"tps": 154.43,
"tps_p99": 207.79,
"tps_p95": 206.1,
"tps_p90": 203.99,
"tps_p50": 160.85,
"tpot": 0.07,
"tpot_p99": 0.1,
"tpot_p95": 0.09,
"tpot_p90": 0.09,
"tpot_p50": 0.07,
"error_rate": 0.0,
"error_request_count": 0
}
],
"start_time_us": 1737576094363076,
"end_time_us": 1740168094363076
}
```
---
## Auto Router
Stop guessing which model to pick. The Auto Router picks the best one for you—whether you care about cost, speed, or accuracy.
## Why Use Auto Router?
* **Save Costs** - Automatically uses cheaper models for simple queries
* **Get Faster Responses** - Routes to the fastest model when speed matters
* **Guarantee Accuracy** - Picks the best model for critical tasks
* **Handle Scale** - No configuration hell, just works
### Quick Start
#### Using API
```json
{
"model": "router/auto",
"messages": [
{
"role": "user",
"content": "What's the capital of France?"
}
]
}
```
#### Using UI
You can also try Auto Router through the LangDB dashboard:
LangDB dashboard showing available Auto Router models and configuration options
> **Note:** The UI shows only a few router variations. For all available options and advanced configurations, use the API.
### Trace Example
Here's what happens behind the scenes when you use Auto Router:
That's it — no config needed. The router classifies the query and picks the best model automatically.
> If you already know the query type (e.g., Finance), skip auto-classification with `router/finance:accuracy`.
## Under the Hood
Behind the scenes, the Auto Router uses lightweight classifiers (NVIDIA for complexity, BART for topic) combined with LangDB's routing engine. These decisions are logged in traces so you can inspect why a query was sent to a specific model.
### How It Works
The Auto Router uses a two-stage classification process:
1. **Complexity Classification**: Uses NVIDIA's classification model to determine if a query is high or low complexity
2. **Topic Classification**: Uses Facebook's BART Large model to identify the query's topic from these categories:
* Academia
* Finance
* Marketing
* Maths
* Programming
* Science
* Vision
* Writing
Based on these classifications and your chosen optimization strategy, the router automatically selects the best model from your available options.
### Router Behavior
| Router Syntax | What happens |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `router/auto` | Classifies complexity + topic. Low-complexity queries go to cheaper models; high-complexity queries go to stronger models. Then applies your optimization strategy. |
| `router/auto:` | Classifies topic only. Ignores complexity and always applies the chosen optimization (cost, accuracy, etc.) for that topic. |
| `router/:` | Skips classification. Directly routes to the specified topic with the chosen optimization mode. |
## Optimization Modes
| Mode | What it does | Best for |
| ------------ | ------------------------------------------------------------------------ | ------------------------- |
| `balanced` | Intelligently distributes requests across models for optimal performance | General apps (default) |
| `accuracy` | Picks models with best benchmark scores | Research, compliance |
| `cost` | Routes to cheapest viable model | Support chatbots, FAQs |
| `latency` | Always picks the fastest | Real-time UIs, voice bots |
| `throughput` | Distributes across many models | High-volume pipelines |
## Case Study
Beating the Best Model
Save costs without losing quality. Auto Router delivers best-model accuracy at a fraction of the price.
Beating GPT-5
Auto Router delivers 83% satisfactory results at 35% lower cost than GPT-5. Real-world testing shows router optimization without quality compromise.
## Use Cases
### Cost Optimization
Perfect for FAQ bots, education apps, and high-volume content generation.
```json
{
"model": "router/auto:cost",
"messages": [
{
"role": "user",
"content": "What are your business hours?"
}
]
}
```
### Accuracy Optimization
Ideal for finance, medical, legal, and research applications.
```json
{
"model": "router/auto:accuracy",
"messages": [
{
"role": "user",
"content": "Analyze this financial risk assessment"
}
]
}
```
### Latency Optimization
Great for real-time assistants, voice bots, and interactive UIs.
```json
{
"model": "router/auto:latency",
"messages": [
{
"role": "user",
"content": "What's the weather like today?"
}
]
}
```
### Balanced (Load Balanced)
Intelligently distributes requests across available models for optimal performance. Works well for most business applications and integrations.
```json
{
"model": "router/auto",
"messages": [
{
"role": "user",
"content": "Help me write a product description"
}
]
}
```
## Direct Category Routing
If you already know your query belongs to a specific domain, you can skip classification and directly route to a topic with your chosen optimization mode.
```json
{
"model": "router/finance:accuracy",
"messages": [
{
"role": "user",
"content": "Analyze the risk factors in this financial derivative"
}
]
}
```
**Result:**
* Skips complexity + topic classification
* Directly applies accuracy optimization for the finance topic
* Routes to the highest-scoring finance-optimized model
**Available topic shortcuts:**
* `router/finance:`
* `router/writing:`
* `router/academia:`
* `router/programming:`
* `router/science:`
* `router/vision:`
* `router/marketing:`
* `router/maths:`
Where `` can be: `balanced`, `accuracy`, `cost`, `latency`, or `throughput`.
**Quick Decision Guide:**
* **Don't know the type?** → Use `router/auto`
* **Know the type?** → Jump straight with `router/:`
## Advanced Configuration
### Topic-Specific Routing
```json
{
"model": "router/auto",
"router": {
"topic_routing": {
"finance": "cost",
"writing": "latency",
"technical": "accuracy"
}
},
"messages": [
{
"role": "user",
"content": "Calculate the net present value of this investment"
}
]
}
```
## Best Practices
1. **Choose the Right Mode** - Match optimization to your use case
2. **Monitor Performance** - Use LangDB's analytics to track routing decisions
3. **Combine with Fallbacks** - Add fallback models for high availability
4. **Test Different Modes** - Experiment to find the best fit
## Integration with Other Features
The Auto Router works seamlessly with:
* **Guardrails** - Apply content filtering before routing
* **MCP Servers** - Access external tools and data sources
* **Response Caching** - Cache responses for frequently asked questions
* **Analytics** - Track routing decisions and performance metrics
---
## Beating GPT-5
Everyone assumes GPT-5 is untouchable — the safest, most accurate choice for every task. But our latest experiments tell a different story. When we put LangDB's **Auto Router** head-to-head against GPT-5, the results surprised us.
## The Setup
We ran **100 real-world prompts** across four categories: Finance, Writing, Science/Math, and Coding. One group always used GPT-5. The other let Auto Router decide the right model.
At first glance, you’d expect GPT-5 to dominate — and in strict A/B judging, it often did. But once we layered in a second check — asking an independent validator whether the Router’s answers were _satisfactory_ (correct, useful, and complete) — the picture flipped.
## What We Found
* **Costs Less:** Router cut spend by **35%** compared to GPT-5 ($1.04 vs $1.58).
* **Good Enough Most of the Time:** Router's answers were judged satisfactory in **83% of cases**.
* **Practical Wins:** When you combine Router wins, ties, and “GPT-5 wins but Router still satisfactory,” the Router came out ahead in **86/100 tasks**.
* **Safe:** There were **zero catastrophic failures** — Router never produced unusable output.
## Breaking Down Quality
On strict comparisons, GPT-5 outscored Router in 65 cases. Router directly won 10, with 25 ties. But here’s the catch: in the majority of those “GPT-5 wins,” the Router’s answer was still perfectly fine.
Think about defining a finance term, writing a short code snippet, or solving a straightforward math problem. GPT-5 might give a longer, more polished answer, but Router’s output was clear, correct, and usable — and it cost a fraction of the price.
The validator helped us separate “better” from “good enough.” And for most workloads, _good enough at lower cost_ is exactly what you want.
## Where Router Shines (and Struggles)
* **Finance:** Router was flawless here, delivering **satisfactory answers for every single prompt**.
* **Coding:** Router handled structured coding tasks well — effective in 30 out of 32 cases.
* **Science/Math:** Router held its own, though GPT-5 still had the edge on trickier reasoning.
* **Writing:** This was the weakest area for Router. GPT-5 consistently produced richer, more polished prose. Still, Router’s outputs were acceptable two-thirds of the time.
***
### Why This Matters
The key takeaway isn’t that Router is “better than GPT-5” in raw accuracy. It’s that **Router is better for your budget without compromising real-world quality**. By knowing when a smaller model is good enough, you save money while still keeping GPT-5 in reserve for the hardest tasks.
In practice, that means:
* **Finance and Coding workloads** → Route automatically and trust the savings.
* **Open-ended creative writing** → Let Router escalate to GPT-5 when needed.
* **Everywhere else** → Expect huge cost reductions without a hit to user experience.
***
## Try It Yourself
Using the Router doesn’t require any special configuration:
```json
{
"model": "router/auto",
"messages": [
{
"role": "user",
"content": "Define liquidity in finance in one sentence."
}
]
}
```
Just point to `router/auto`. LangDB takes care of routing — so you get the right balance of cost and quality, automatically.
---
## Beating the Best Model
Most developers assume that using the _best_ model is the safest bet for every query.\
But in practice, that often means **paying more than you need to** — especially when cheaper models can handle simpler queries just as well.
LangDB’s Auto Router shows you don’t always need the “best” model — just the _right_ model for the job.
## The Question We Asked
When building AI applications, you face a constant trade-off: **performance vs. cost**. Do you always use the most powerful (and expensive) model to guarantee quality? Or do you risk cheaper alternatives that might fall short on complex tasks?
We wanted to find out: _Can smart routing beat the "always use the best model" strategy?_
## Our Experiment
We designed a head-to-head comparison using **100 real-world queries** across four domains: Finance, Writing, Science/Math, and Coding. Each query was tested against two strategies:
* **Auto Router** → Analyzed query complexity and topic, then selected the most cost-effective model that could handle the task
* **Router:Accuracy** → Always defaulted to the highest-performing model (the "best model" approach)
**What made this test realistic:**
* **Diverse complexity:** 70 low-complexity queries (simple conversions, definitions) and 30 high-complexity queries (complex analysis, multi-step reasoning)
* **Real-world domains:** Finance calculations, professional writing, scientific explanations, and coding problems
* **Impartial judging:** Used GPT-5-mini as an objective judge to compare response quality
**Sample of what we tested:**
* **Finance:** "A company has revenue of $200M and expenses of $150M. What is its profit?"
* **Writing:** "Write a one-line professional email subject requesting a meeting"
* **Science/Math:** "Convert 100 cm into meters"
* **Coding:** "Explain what a variable is in programming in one sentence"
## Results
| Metric | Auto Router | Router:Accuracy |
| ------------------- | ------------------ | --------------- |
| **Total Cost** | **$0.95** | $1.64 |
| **Wins** | **65%** | 0% |
| **Ties** | 35% | 35% |
| **Losses** | 0% | 0% |
| **Accuracy Parity** | 100% (wins + ties) | 100% |
### What Wins & Ties Mean
* **Win** → Auto Router chose a **cheaper model**, and the output was _equal or better_ than the best model.
* **Tie** → Auto Router escalated to the **best model itself**, because the query was complex enough to require it.
* **Loss** → Didn’t happen. Auto Router never underperformed compared to always using the best model.
> In other words: **Auto Router matched or beat the best model strategy 100% of the time — while cutting costs by \~42%.**
## Category Breakdown
| Category | Count | Router Wins | Ties (Used Best Model) |
| -------------- | ----- | ----------- | ---------------------- |
| Finance | 25 | 23 | 2 |
| Writing | 24 | 18 | 6 |
| Science & Math | 19 | 14 | 5 |
| Coding | 32 | 10 | 22 |
* In **Finance** and **Writing**, Auto Router confidently used cheaper models most of the time.
* In **Coding**, Auto Router often escalated to the best model — proving it knows when not to compromise.
## The Methodology Behind the Magic
**How Auto Router Works:** Auto Router doesn't just pick models randomly. It uses a sophisticated classification system that:
1. **Analyzes query complexity** — Is this a simple fact lookup or a complex reasoning task?
2. **Identifies the domain** — Finance, writing, coding, or science/math?
3. **Matches to optimal model** — Selects the most cost-effective model that can handle the specific complexity level
**The "Always Best" Approach:** Router:Accuracy takes the conservative route — always selecting the highest-performing model regardless of query complexity. It's like using a Formula 1 car for grocery shopping.
**Fair Comparison:** We used GPT-5-mini as an impartial judge to evaluate response quality across both strategies. The judge compared answers based on correctness, usefulness, and completeness without knowing which routing strategy was used.
## What This Means for Developers
**The Real-World Impact:**
* **Cost optimization without compromise** — Save 42% on API costs while maintaining quality
* **Intelligent escalation** — Complex queries automatically get the best models
* **No manual tuning** — The router handles the complexity analysis for you
## Try It Yourself
Using Auto Router is simple — just point to `router/auto`:
```json
{
"model": "router/auto",
"messages": [
{
"role": "user",
"content": "A company has revenue of $200M and expenses of $150M. What is its profit?"
}
]
}
```
Auto Router will automatically select the most cost-effective model that can handle your query complexity.
## The Bottom Line
* **Save Money** → Auto Router avoids overpaying on simple queries
* **Stay Accurate** → For complex cases, it automatically picks the strongest model
* **Smarter Than "Always Best"** → Matches or beats the best-model-only approach at a fraction of the cost
## Takeaway
You don't need to pick the "best" model every time.
With Auto Router:
* **Simple queries → cheaper models** save you money
* **Complex queries → stronger models** keep accuracy intact
* **Overall → 100% accuracy parity at 42% lower cost**
That's the power of LangDB Auto Router.
---
## Cost Control
LangDB enables **cost tracking, project budgeting, and cost groups** to help manage AI usage efficiently.
## **Cost Groups (Business Tier & Above)**
* **Available in Business & Enterprise tiers** under **User Management**.
* Organize users into **cost groups** to track and allocate spending.
* Cost groups help in **budgeting but are independent of user roles**.
## **Project-Level Spending Limits**
* **Set daily, monthly, and total spending limits** per project.
* Enforce **per-user limits** to prevent excessive usage.
* Available in **Project Settings → Cost Control**.
## **Cost Group-Based Role Management**
* **Admins and Billing users** can define **spending limits for cost groups**.
* Set **daily, monthly, and total budgets** per group.
* Useful for **controlling team-based expenses** independently of project limits.
---
## Evaluation
You can evaluate the interactions with the Models in LangDB.
#### Data Collection
To evaluate model interactions, we extract message data from LangDB. This involves:
* Fetching all messages from conversation threads using the LangDB API.
* Exporting the data into a structured format such as a DataFrame (df) or CSV file.
```python
from pylangdb.client import LangDb
client = LangDb(
api_key=os.getenv("LANGDB_API_KEY"),
project_id=os.getenv("LANGDB_PROJECT_ID"))
thread_id =[...,...,...] # LangDB Threads
df = client.create_evaluation_df(thread_ids)
```
### Cost Calculation
Once the data is collected, we can compute:
* **Total cost**: Sum of the cost of all interactions.
* **Average**: Average cost per message.
```python
print(f"Total cost across all threads: ${df['thread_total_cost'].sum():.4f}")
thread_costs = df.groupby('thread_id')['thread_total_cost'].sum()
avg_cost = df['thread_total_cost'].sum() / len(df)
print(f"\nAverage cost per message: ${avg_cost:.4f}")
```
### Custom Evaluations
Beyond cost analysis, the messages allows you to conduct deeper insights into topic distribution, and trends.
```python
# Analyze topic distribution
topics = analyzer.get_topic_distribution(thread_ids)
print("\nTopic Distribution Results:")
print(topics)
```
Example Output:
```bash
{
"topic_distribution": {
"Programming Languages": 5,
"Python Concepts": 6,
"Web Development": 2,
"Error Handling": 1,
"Testing": 1,
"Optimization": 1
},
"total_messages": 10
}
```
**For more evaluations, check out the full** [**notebook**](https://github.com/langdb/langdb-samples/tree/main/examples/evaluation)**!**
---
## Guardrails
LangDB allow developers to enforce specific constraints and checks on their LLM calls, ensuring safety, compliance, and quality control.
Guardrails currently support request validation and logging, ensuring structured oversight of LLM interactions.
Guardrail Templates on LangDB
These guardrails include:
* **Content Moderation**: Detects and filters harmful or inappropriate content (e.g., toxicity detection, sentiment analysis).
* **Security Checks**: Identifies and mitigates security risks (e.g., PII detection, prompt injection detection).
* **Compliance Enforcement**: Ensures adherence to company policies and factual accuracy (e.g., policy adherence, factual accuracy).
* **Response Validation**: Validates response format and structure (e.g., word count, JSON schema, regex patterns).
Guardrails can be configured via the UI or API, providing flexibility for different use cases.
## Guardrail Behaviour
When a guardrail blocks an input or output, the system returns a structured error response. Below are some example responses for different scenarios:
### **Example 1: Input Rejected by Guard**
```json
{
"id": "",
"object": "chat.completion",
"created": 0,
"model": "",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Input rejected by guard",
"tool_calls": null,
"refusal": null,
"tool_call_id": null
},
"finish_reason": "rejected"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
"cost": 0.0
}
}
```
### **Example 2: Output Rejected by Guard**
```json
{
"id": "5ef4d8b1-f700-46ca-8439-b537f58f7dc6",
"object": "chat.completion",
"created": 1741865840,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Output rejected by guard",
"tool_calls": null,
"refusal": null,
"tool_call_id": null
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 21,
"completion_tokens": 40,
"total_tokens": 61,
"cost": 0.000032579999999999996
}
}
```
## Limitations
It is important to note that **guardrails cannot be applied to streaming outputs**.
## Guardrail Templates
LangDB provides prebuilt templates to enforce various constraints on LLM responses. These templates cover areas such as content moderation, security, compliance, and validation.
The following table provides quick access to each guardrail template:
| Guardrail | Description |
| ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------- |
| [Toxicity Detection](guardrails.md#toxicity-detection-content-toxicity) | Detects and filters toxic or harmful content. |
| [JSON Schema Validator](guardrails.md#toxicity-detection-content-toxicity) | Validates responses against a user-defined JSON schema. |
| [Competitor Mention Check](guardrails.md#competitor-mention-check-content-competitor-mentions) | Detects mentions of competitor names or products. |
| [PII Detection](guardrails.md#pii-detection-security-pii-detection) | Identifies personally identifiable information in responses. |
| [Prompt Injection Detection](guardrails.md#prompt-injection-detection-security-prompt-injection) | Detects attempts to manipulate the AI through prompt injections. |
| [Company Policy Compliance](guardrails.md#company-policy-compliance-compliance-company-policy) | Ensures responses align with company policies. |
| [Regex Pattern Validator](guardrails.md#regex-pattern-validator-validation-regex-pattern) | Validates responses against specified regex patterns. |
| [Word Count Validator](guardrails.md#word-count-validator-validation-word-count) | Ensures responses meet specified word count requirements. |
| [Sentiment Analysis](guardrails.md#sentiment-analysis-content-sentiment-analysis) | Evaluates sentiment to ensure appropriate tone. |
| [Language Validator](guardrails.md#language-validator-content-language-validation) | Checks if responses are in allowed languages. |
| [Topic Adherence](guardrails.md#topic-adherence-content-topic-adherence) | Ensures responses stay on specified topics. |
| [Factual Accuracy](guardrails.md#factual-accuracy-content-factual-accuracy) | Validates that responses contain factually accurate information. |
### **Toxicity Detection (`content-toxicity`)**
Detects and filters out toxic, harmful, or inappropriate content.
| Parameter | Type | Description | Defaults |
|-----------|------|-------------|----------|
| `threshold` | `number` | Confidence threshold for toxicity detection. | Required |
| `categories` | `array` | Categories of toxicity to detect. | `["hate", "harassment", "violence", "self-harm", "sexual", "profanity"]` |
| `evaluation_criteria` | `array` | Criteria used for toxicity evaluation. | `["Hate speech", "Harassment", "Violence", "Self-harm", "Sexual content", "Profanity"]` |
### **JSON Schema Validator (`validation-json-schema`)**
Validates responses against a user-defined JSON schema.
| Parameter | Type | Description | Defaults |
| --------- | -------- | --------------------------------------------------------------------- | -------- |
| `schema` | `object` | Custom JSON schema to validate against (replace with your own schema) | Required |
### **Competitor Mention Check (`content-competitor-mentions`)**
Detects mentions of competitor names or products in LLM responses.
| Parameter | Type | Description | Defaults |
| ---------------- | --------- | ----------------------------------------- | -------------------------- |
| `competitors` | `array` | List of competitor names. | `["company1", "company2"]` |
| `match_partial` | `boolean` | Whether to match partial names. | `true` |
| `case_sensitive` | `boolean` | Whether matching should be case sensitive | `false` |
### **PII Detection (`security-pii-detection`)**
Detects personally identifiable information (PII) in responses.
| Parameter | Type | Description | Defaults |
| ----------- | --------- | ------------------------------- | ------------------------------------------ |
| `pii_types` | `array` | Types of PII to detect. | `["email", "phone", "ssn", "credit_card"]` |
| `redact` | `boolean` | Whether to redact detected PII. | `false` |
### **Prompt Injection Detection (`security-prompt-injection`)**
Identifies prompt injection attacks attempting to manipulate the AI.
| Parameter | Type | Description | Defaults |
| --------------------- | -------- | ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `threshold` | `number` | Confidence threshold for injection detection. | Required |
| `detection_patterns` | `array` | Common patterns used in prompt injection attacks. | `["Ignore previous instructions", "Forget your training", "Tell me your prompt"]` |
| `evaluation_criteria` | `array` | Criteria used for detection. | `["Attempts to override system instructions", "Attempts to extract system prompt information", "Attempts to make the AI operate outside its intended purpose"]` |
### **Company Policy Compliance (`compliance-company-policy`)**
Ensures that responses align with predefined company policies.
| Parameter | Type | Description | Defaults |
| ----------------- | -------- | ---------------------------------------- | ---------------------------- |
| `embedding_model` | `string` | Model used for text embedding. | `text-embedding-ada-002` |
| `threshold` | `number` | Similarity threshold for compliance. | Required |
| `dataset` | `object` | Example dataset for compliance checking. | Contains predefined examples |
### **Regex Pattern Validator (`validation-regex-pattern`)**
Validates responses against specific regex patterns.
| Parameter | Type | Description | Defaults |
| ------------ | -------- | ----------------------------------------------------- | -------------------------- |
| `patterns` | `array` | Model List of regex patterns. | `["^[A-Za-z0-9\s.,!?]+$"]` |
| `match_type` | `string` | Whether all, any, or none of the patterns must match. | `"all"` |
### **Word Count Validator (`validation-word-count`)**
Ensures responses meet specified word count requirements.
| Parameter | Type | Description | Defaults |
| -------------- | -------- | ----------------------------------------------------- | -------- |
| `min_words` | `number` | Model List of regex patterns. | `10` |
| `max_words` | `number` | Whether all, any, or none of the patterns must match. | `500` |
| `count_method` | `string` | Method for word counting. | `split` |
### **Sentiment Analysis (`content-sentiment-analysis`)**
Evaluates the sentiment of responses to ensure appropriate tone.
| Parameter | Type | Description | Defaults |
| -------------------- | -------- | --------------------------------------------- | ------------------------- |
| `allowed_sentiments` | `array` | Allowed sentiment categories. | `["positive", "neutral"]` |
| `threshold` | `number` | Confidence threshold for sentiment detection. | `0.7` |
### **Language Validator (`content-language-validation`)**
Checks if responses are in allowed languages.
| Parameter | Type | Description | Defaults |
| ------------------- | -------- | -------------------------------------------- | ------------- |
| `allowed_languages` | `array` | List of allowed languages. | `["english"]` |
| `threshold` | `number` | Confidence threshold for language detection. | `0.9` |
### **Topic Adherence (`content-topic-adherence`)**
Ensures responses stay on specified topics.
| Parameter | Type | Description | Defaults |
| ------------------ | -------- | ----------------------------------------- | ------------------------------------------------- |
| `allowed_topics` | `array` | List of allowed topics. | `["Product information", "Technical assistance"]` |
| `forbidden_topics` | `array` | List of forbidden topics. | `["politics", "religion"]` |
| `threshold` | `number` | Confidence threshold for topic detection. | `0.7` |
### **Factual Accuracy (`content-factual-accuracy`)**
Validates that responses contain factually accurate information.
| Parameter | Type | Description | Defaults |
| --------------------- | -------- | ----------------------------------------------- | ------------------------------------------------------------------ |
| `reference_facts` | `array` | List of reference facts. | `[]` |
| `threshold` | `number` | Confidence threshold for factuality assessment. | `0.8` |
| `evaluation_criteria` | `array` | Criteria used to assess factual accuracy. | `["Contains verifiable information", "Avoids speculative claims"]` |
---
## MCP Support
LangDB simplifies how you work with MCP (Model Context Protocol) servers — whether you want to use a **built-in** [**Virtual MCP**](../concepts/virtual-mcp-servers/) or connect to an **external MCP server**.
Browse publicly-available MCP servers on LangDB
**Model Context Protocol (MCP)** is an **open standard** that enables AI models to **seamlessly communicate** with external systems. It allows models to **dynamically process contextual data**, ensuring **efficient, adaptive, and scalable** interactions. MCP simplifies **request orchestration** across distributed AI systems, enhancing **interoperability and context-awareness**.
With **native tool integrations**, MCP connects AI models to **APIs, databases, local files, automation tools, and remote services** through a standardized protocol. Developers can effortlessly integrate MCP with **IDEs, business workflows, and cloud platforms**, while retaining the flexibility to **switch between LLM providers**. This enables the creation of **intelligent, multi-modal workflows** where AI securely interacts with real-world data and tools.
For more details, visit the [Model Context Protocol official page](https://modelcontextprotocol.io/introduction) and explore [Anthropic MCP documentation](https://docs.anthropic.com/en/docs/build-with-claude/mcp).
## Using Virtual MCPs
### Using API
LangDB allows you to create [**Virtual MCP Servers**](../concepts/virtual-mcp-servers/) directly from the dashboard.\
You can instantly select and bundle tools like database queries, search APIs, or automation tasks into a single MCP URL — no external setup needed.
Here's an example of how you can use a Virtual MCP Server in your project:
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
```python
from openai import OpenAI
from uuid import uuid4
client = OpenAI(
base_url="https://api.us-east-1.langdb.ai/LangDBProjectID/v1",
api_key="xxxx",
default_headers={"x-thread-id": str(uuid4())},
)
mcpServerUrl = "Virtual MCP Server URL"
response = client.chat.completions.create(
model="openai/gpt-4.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the databases available"}
],
extra_body={
"mcp_servers": [
{
"server_url": mcpServerUrl,
"type": "sse"
}
]
}
)
```
```typescript
import openai, {
OpenAI
} from 'openai';
import { v4 as uuid4 } from 'uuid';
const client = new OpenAI({
baseURL: "https://api.us-east-1.langdb.ai/LangDBProjectID/v1",
apiKey: "xxxx",
defaultHeaders: {
"x-thread-id": uuid4()
}
});
const mcpServerUrl = 'Virtual MCP URL';
async function getAssistantReply() {
const {
choices
} = await client.chat.completions.create({
model: "openai/gpt-4.1-nano",
messages: [
{role: "system", content: "You are a helpful assistant."},
{role: "user", content: "what are the databases on clickhouse?"} ,
// @ts-expect-error mcp_servers is a LangDB extension
mcp_servers: [
{ server_url: mcpServerUrl, type: 'sse' }
]
}
);
console.log('Assistant:', choices[0].message.content);
}
```
Checkout [Virtual MCP](../concepts/virtual-mcp-servers/) and [Guides](https://app.gitbook.com/s/jwXvc8GvLP6kUvOASaLP/using-mcps) section for usecases.
### Using MCP Clients
You can instantly connect LangDB’s Virtual MCP servers to editors like **Cursor**, **Claude**, or **Windsurf.**
Run this in your terminal to set up MCP in Cursor:
```bash
npx @langdb/mcp setup --client cursor
```
You can now call tools directly in your editor, with full tracing on LangDB.
## Connecting to External MCP Servers
If you already have an MCP server hosted externally — like Smithery’s Exa MCP — you can plug it straight into LangDB with zero extra setup.
Just pass your external MCP server URL in `extra_body` when you make a chat completion request. For example Smithery:
```python
extra_body = {
"mcp_servers": [
{
"server_url": "wss://your-mcp-server.com/ws?config=your_encoded_config",
"type": "ws"
}
]
}
```
For a **complete example** of how to use external MCP, refer to the [Smithery guide](https://app.gitbook.com/s/0xzoP8YRDIkACawdU2ht/smithery).
---
## Model Access Control
Restrict which AI models are available for specific projects. Only **Admins** can configure these restrictions - other roles are bound by the settings.
## How It Works
* **Admin-only configuration**: Only Admins can set which models are allowed per project
* **API enforcement**: Restricted models return access denied errors
* **Team-wide**: All project members are bound by the same restrictions
* **Universal**: Works across all API endpoints and integrations
## Setup (Admin Only)
1. **Project Settings** → **Model**
2. **Select allowed models** from the list
3. **Save configuration**
Test with an API call to verify restrictions are working.
## Common Use Cases
* **Cost control**: Restrict expensive models in dev environments
* **Production stability**: Only allow tested models in production
* **Compliance**: Meet regulatory requirements by limiting model access
## Troubleshooting
**"Model not available" errors:**
* Check if the model is in the project's allowed list
* Verify model restrictions are enabled
* Confirm you're using the correct model identifier
**Can't modify restrictions:**
* Only **Admin** role can configure restrictions
---
## Project Access Control
Select which users in your organization can access specific projects. Only **Admins** can configure project access - other roles cannot modify these settings.
## How It Works
* **Admin-only configuration**: Only Admins can enable/disable user access per project
* **User-level control**: Individual users can be granted or revoked project access
* **Role preservation**: Users keep their organization roles but may be restricted from certain projects
* **API enforcement**: Users without project access cannot make API calls to restricted projects
## Setup (Admin Only)
1. **Project Settings** → **Users** → **User Access Configuration**
2. **Search and select users** to grant project access
3. **Toggle individual users** on/off for the project
4. **Use "All Users"** toggle to quickly enable/disable everyone
5. **Save configuration**
## User States
* **Enabled**: User can access the project and make API calls
* **Disabled**: User cannot access the project (blocked from API calls)
* **All Users toggle**: Bulk enable/disable all organization users for the project
## Common Use Cases
* **Sensitive projects**: Restrict access to confidential or regulated projects
* **Client work**: Limit project access to specific team members working with particular clients
* **Development stages**: Control access to production vs development projects
* **Cost management**: Prevent unauthorized usage by limiting project access
## Troubleshooting
### **"Access denied" errors:**
* Check if the user is enabled for the specific project
* Verify the user exists in the organization
* Confirm the project access configuration is saved
### **Can't modify project access:**
* Only **Admin** role can configure project access
* Ensure you're in the correct project settings
---
## Prompt Caching
To save on inference costs, you can leverage prompt caching on supported providers and models. When a provider supports it, LangDB will make a best-effort to route subsequent requests to the same provider to make use of the warm cache.
Most providers automatically enable prompt caching for large prompts, but some, like Anthropic, require you to enable it on a per-message basis.
## How Caching Works
### **Automatic Caching**
Providers like OpenAI, Grok, DeepSeek, and (soon) Google Gemini enable caching by default once your prompt exceeds a certain length (e.g. 1024 tokens).
* **Activation**: No change needed. Any prompt over the length threshold is written to cache.
* **Best Practice**: Put your static content (system prompts, RAG context, long instructions) _first_ in the message so it can be reused.
* **Pricing**:
* **Cache Write**: Mostly free or heavily discounted.
* **Cache Read**: Deep discounts vs. fresh inference.
### **Manual Caching:**
Anthropic’s Claude family requires you to mark which parts of the message are cacheable by adding a `cache_control` object. You can also set a TTL to control how long the block stays in cache.
* **Activation**: You must wrap static blocks in a `content` array and give them a `cache_control` entry.
* **TTL**: Use `{"ttl": "5m"}` or `{"ttl": "1h"}` to control expiration (default 5 minutes).
* **Best For**: Huge documents, long backstories, or repeated system instructions.
* **Pricing**:
* **Cache Write**: 1.25× the normal per-token rate
* **Cache Read**: 0.1× (10%) of the normal per-token rate
* **Limitations**: Ephemeral (expires after TTL), limited number of blocks.
In this run you’ll see “Prompt Caching: 99.9% Write,” a small cost increase (~25%).
## **Caching Example ( Anthropic)**
Here is an example of caching a large document. This can be done in either the `system` or `user` message.
```json
{
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant that analyzes legal documents. The following is a terms of service document:"
},
{
"type": "text",
"text": "HUGE DOCUMENT TEXT...",
"cache_control": {
"type": "ephemeral",
"ttl": "1h"
}
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Summarize the key points about data privacy."
}
]
}
]
}
```
***
## Provider Support Matrix
| Provider | Auto-cache? | Manual flag? | TTL | Write cost | Read cost |
| ---------------- | :------------------: | :-------------------: | :-------: | :--------: | :-----------: |
| OpenAI | :white\_check\_mark: | :x: | N/A | standard | 0.25x or 0.5x |
| Grok | :white\_check\_mark: | :x: | N/A | standard | 0.25x |
| DeepSeek | :white\_check\_mark: | :x: | N/A | standard | 0.25x |
| Anthropic Claude | :x: | `cache_control` + TTL | 5 m / 1 h | 1.25× | 0.1× |
***
> For the most up-to-date information on a specific model or provider's caching policy, pricing, and limitations, please refer to the model page on LangDB
---
## Provider Routing
Stop worrying about which provider to pick. With Provider Routing, you can call a model by name, and LangDB will automatically select the right provider for you.
## Why Use Provider Routing?
* **One Name, Many Providers** – Call a model like `deepseek-v3.1` and LangDB picks from DeepSeek official, Parasail, DeepInfra, Fireworks AI, and more.
* **Optimize by Mode** – Choose whether you want lowest cost, fastest latency, highest accuracy, or simply balanced routing.
***
### Quick Start
```json
{
"model": "deepseek-v3.1",
"messages": [
{
"role": "user",
"content": "Explain reinforcement learning in simple terms."
}
]
}
```
That’s it — LangDB will resolve `deepseek-v3.1` across multiple providers, and by default use **balanced** mode.
***
## Optimization Modes
When you specify only a model name, LangDB chooses the provider according to your selected mode.
| Mode | What it does | Best for |
| -------------- | --------------------------------------------------------------------- | ------------------------- |
| **balanced** | Distributes requests across providers for optimal overall performance | General apps (default) |
| **accuracy** | Routes to the provider with the best benchmark score | Research, compliance |
| **cost** | Picks the cheapest provider by input/output token price | Support chatbots, FAQs |
| **latency** | Always selects the lowest latency provider | Real-time UIs, voice bots |
| **throughput** | Spreads requests across all providers to maximize concurrency | High-volume pipelines |
***
## Examples
### Balanced (default)
```json
{
"model": "deepseek-v3.1",
"messages": [{ "role": "user", "content": "Summarize this article." }]
}
```
LangDB chooses the provider dynamically, balancing cost, latency, and accuracy.
***
### Cost Optimization
```json
{
"model": "deepseek-v3.1:cost",
"messages": [{ "role": "user", "content": "Write a short FAQ response." }]
}
```
LangDB picks the **cheapest provider** for `deepseek-v3.1` based on input/output token prices (e.g. Parasail, Fireworks AI, or DeepInfra if they’re lower than DeepSeek official).
***
### Accuracy Optimization
```json
{
"model": "deepseek-v3.1:accuracy",
"messages": [{ "role": "user", "content": "Solve this math word problem." }]
}
```
Routes to the provider with the **highest benchmark score** for `deepseek-v3.1`.
***
### Latency Optimization
```json
{
"model": "deepseek-v3.1:latency",
"messages": [{ "role": "user", "content": "Respond quickly for a live chat." }]
}
```
Always picks the provider with the **fastest response times**.
***
### Throughput Optimization
```json
{
"model": "deepseek-v3.1:throughput",
"messages": [{ "role": "user", "content": "Translate this dataset." }]
}
```
Distributes requests across **all available providers** for `deepseek-v3.1` to maximize scale.
***
## Explicit Provider Pinning
If you want full control, you can always specify the provider explicitly:
```json
{
"model": "parasail/deepseek-v3.1",
"messages": [{ "role": "user", "content": "Generate a poem." }]
}
```
This bypasses provider routing and always uses the given provider.
***
## Summary
* Use `model` without provider → LangDB does **provider routing**.
* Add `:mode` suffix → pick between balanced, accuracy, cost, latency, or throughput.
* Use `provider/model` → pin a specific provider directly.
Provider Routing makes it easy to scale across multiple vendors without rewriting your code.
---
## Publishing MCP Servers
LangDB already offers a set of managed MCP servers you can use out-of-the-box, but you can also publish your own custom MCP server by linking a GitHub repo or configuring it manually. Once deployed, your custom MCP server can also be added to a [**Virtual MCP Server**](../concepts/virtual-mcp-servers/) in LangDB, allowing you to group multiple tools together under one interface.
## Github Auto Setup
Paste your GitHub repo URL, and LangDB will attempt to generate the MCP server configuration automatically.
1. **Open the "Publish MCP Server" dialog**
2. **Enter the GitHub Repository URL**
3. LangDB inspects the repo (e.g., `package.json`) and pre-fills:
* `command`
* `args`
* `env` structure
You can edit or override these values before publishing.
## Manual Configuration
If you prefer full control, enable **Manual configuration** and provide the required fields:
### General Info
| Field | Required | Description |
| ----------- | -------- | ------------------------------------- |
| Name | ✅ | Unique identifier for your MCP server |
| Logo URL | ✅ | PNG/SVG URL (128×128 preferred) |
| Description | ⬜ | What your MCP server does |
| Visibility | ✅ | Public or Private |
| Categories | ✅ | Select or define categories |
### Configuration JSON
```json
{
"command": "npx",
"args": [
"-y",
"your-package-name"
],
"env": {
"API_KEY": "{{api_key}}"
}
}
```
:::info
You can use either `npx` or `uvx` as the command depending on your MCP setup.
:::
* Wrap variable names using `{{...}}`
* These placeholders must match environment variables defined below
### Environment Variables
Define each required variable:
NameTypeDescriptionRequiredapi_keyStringAPI key passed to your MCP toolfalse
## Deploying Your MCP Server
1. Make sure all required fields are completed, including configuration and environment variables.
2. Resolve any validation warnings highlighted in the interface.
3. Click **Publish** to deploy.
Once published, your MCP server will be accessible via LangDB—privately within your project or publicly if selected—and ready to power tool calls in any compatible LLM setup.
You can also add this newly deployed MCP server into a **Virtual MCP Server** in LangDB to combine it with other tools and provide a unified interface to your LLMs.
***
## Configuration Notes
* You can insert variables in both `args` and `env` blocks using `{{...}}`
* Automatic GitHub mode can be used as a starting point and then switched to manual for advanced edits
---
## Response Caching
Response caching is designed for **faster response times**, reduced compute cost, and consistent outputs when handling repeated or identical prompts. Perfect for dashboards, agents, and endpoints with predictable queries.
## Benefits
* Faster responses for identical requests (cache hit)
* Reduced model/token usage for repeated inputs
* Consistent outputs for the same input and parameters
## Using Response Caching
### Through Virtual Model
1. Toggle **Response Caching** ON.
2. Select the cache type:
* **Exact match** (default): Matches prompt.
* _(Distance-based matching is coming soon.)_
3. Set **Cache expiration time** in seconds (default: `1200`).
Once enabled, identical requests will reuse the cached output as long as it hasn’t expired.
### Through API Calls
You can use caching on a per-request basis by including a `cache` field in your API body:
```json
{
"model": "openai/gpt-4.1",
"messages": [
{"role": "user", "content": "Summarize the news today"}
],
"cache": {
"type": "exact",
"expiration_time": 1200
}
}
```
* `type`: Currently only `exact` is supported.
* `expiration_time`: Time in seconds (e.g., 1200 for 20 minutes).
**If caching is enabled in both the virtual model and the request, the API payload takes priority.**
### **Pricing**
* Cache hits are billed at **0.1× the standard token price** (90% cheaper than a normal model call).
## Cache Hits
* When a response is served from cache, it is clearly marked as **Cache: HIT** in traces.
* You’ll also see:
* **Status:** `200`
* **Trace ID** and **Thread ID** for debuging
* **Start time / Finish time**: Notice how the **duration** is typically `<0.01s` for cache hits.
* **Cost:** Cache hits are billed at a much lower rate (shown here as `$0.000027`).
* The “Cache” field is displayed prominently (green “HIT” label).
Response caching in LangDB is a practical way to improve latency, reduce compute costs, and ensure consistent outputs for repeated queries. Use the UI or API to configure caching, monitor cache hits in traces and dashboard, and take advantage of reduced pricing for cached responses.
For most projects with stable or repeated inputs, enabling caching is a straightforward optimization that delivers immediate benefits.
---
## Routing
LangDB AI Gateway optimizes LLM selection based on cost, speed, and availability, ensuring efficient request handling. This guide covers the various dynamic routing strategies available in the system, including fallback, script-based, optimized, percentage-based, and latency-based routing.
This ensures efficient request handling and optimal model selection tailored to specific application needs.
## Understanding Targets
Before diving into routing strategies, it's essential to understand **targets** in LangDB AI Gateway. A **target** refers to a specific model or endpoint to which requests can be directed. Each target represents a potential processing unit within the routing logic, enabling optimal performance and reliability.
```json
{
"model": "router/dynamic",
"router": {
"type": "percentage",
"targets_percentages": [
40,
60
],
"targets": [
{
"model": "openai/gpt-4.1",
"mcp_servers": [
{
"slug": "mymcp_zoyhbp3u",
"name": "mymcp",
"type": "sse",
"server_url": "https://api.staging.langdb.ai/mymcp_zoyhbp3u"
}
],
"extra": {
"guards": [
"openai_moderation_y6ln88g4"
]
}
},
{
"model": "anthropic/claude-3.7-sonnet",
"mcp_servers": [
{
"slug": "mymcp_zoyhbp3u",
"name": "mymcp",
"type": "sse",
"server_url": "https://api.staging.langdb.ai/mymcp_zoyhbp3u"
}
],
"extra": {
"guards": [
"openai_moderation_y6ln88g4"
]
},
"temperature": 0.6,
"messages": [
{
"content": "You are a helpful assistant",
"id": "02cb4630-b01a-42d9-a226-94968865fbe0",
"role": "system"
}
]
}
]
}
}
```
**Target Parameters**
Each target in LangDB is essentially a self-contained configuration, similar to a virtual model. A target can include:
* **Model** – The identifier for the base model to use (e.g. `openai/gpt-4o`)
* **Prompt** – Optional system and user messages to steer the model
* [**MCP Servers**](../../concepts/virtual-mcp-servers/) – Support to [Virtual MCP Servers](../../concepts/virtual-mcp-servers/)
* [**Guardrails**](../guardrails.md) – Validations, Moderations.
* **Response Format** – `text`, `json_object`, or `json_schema`
* **Custom Parameters** – Tuning controls like:
* `temperature`
* `max_tokens`
* `top_p`
* `frequency_penalty`
* `presence_penalty`
## Routing Strategies
LangDB AI Gateway supports multiple routing strategies that can be combined and customized to meet your specific needs:
| Routing Strategy | Description |
| ------------------------------------------------------- | --------------------------------------------------------------------------- |
| [Fallback Routing](#fallback-routing) | Sequentially routes requests through multiple models in case of |
| [Optimized Routing](#optimized-routing) | Selects the best model based on real-time performance metrics. |
| [Percentage-Based Routing](#percentage-based-routing) | Distributes traffic between multiple models using predefined weightings. |
| [Latency-Based Routing](#latency-based-routing) | Chooses the model with the lowest response time for real-time applications. |
| [Nested Routing](#nested-routing) | Combines multiple routing strategies for flexible traffic management. |
### Fallback Routing
Fallback routing allows sequential attempts to different model targets in case of failure or unavailability. It ensures robustness by cascading through a list of models based on predefined logic.
```json
{
"model": "router/dynamic",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the formula of a square plot?" }
],
"router": {
"router": "router",
"type": "fallback", // Type: fallback/script/optimized/percentage/latency
"targets": [
{ "model": "openai/gpt-4o-mini", "temperature": 0.9, "max_tokens": 500, "top_p": 0.9 },
{ "model": "deepseek/deepseek-chat", "frequency_penalty": 1, "presence_penalty": 0.6 }
]
},
"stream": false
}
```
### Optimized Routing
Optimized routing automatically selects the best model based on real-time performance metrics such as latency, response time, and cost-efficiency.
```json
{
"model": "router/dynamic",
"router": {
"name": "fastest",
"type": "optimized",
"metric": "ttft",
"targets": [
{ "model": "gpt-3.5-turbo", "temperature": 0.8, "max_tokens": 400, "frequency_penalty": 0.5 },
{ "model": "gpt-4o-mini", "temperature": 0.9, "max_tokens": 500, "top_p": 0.9 }
]
}
}
```
Here, the request is routed to the model with the lowest Time-to-First-Token (TTFT) among gpt-3.5-turbo and gpt-4o-mini.
**Metrics:**
* **Requests** – Total number of requests sent to the model.
* **InputTokens** – Number of tokens provided as input to the model.
* **OutputTokens** – Number of tokens generated by the model in response.
* **TotalTokens** – Combined count of input and output tokens.
* **RequestsDuration** – Total duration taken to process requests.
* **Ttft (Time-to-First-Token) (Default)** – Time taken by the model to generate its first token after receiving a request.
* **LlmUsage** – The total computational cost of using the model, often used for cost-based routing.
### Percentage-Based Routing
Percentage-based routing distributes requests between models according to predefined weightings, allowing load balancing, A/B testing, or controlled experimentation with different configurations. Each model can have distinct parameters while sharing the request load.
```json
{
"model": "router/dynamic",
"router": {
"name": "dynamic",
"type": "percentage",
"targets": [
{ "model": "openai/gpt-4o-mini", "temperature": 0.9, "max_tokens": 500, "top_p": 0.9 },
{ "model": "openai/gpt-4o-mini", "temperature": 0.8, "max_tokens": 400, "frequency_penalty": 1 }
],
"targets_percentages": [ 70, 30 ]
}
}
```
### Latency-Based Routing
Latency-based routing selects the model with the lowest response time, ensuring minimal delay for real-time applications like chatbots and interactive AI systems.
```
{
"model": "router/dynamic",
"router": {
"name": "fastest_latency",
"type": "latency",
"targets": [
{ "model": "openai/gpt-4o-mini", "temperature": 0.9, "max_tokens": 500, "top_p": 0.9 },
{ "model": "deepseek/deepseek-chat", "frequency_penalty": 1, "presence_penalty": 0.6 },
{ "model": "gemini/gemini-2.0-flash-exp", "temperature": 0.8, "max_tokens": 400, "frequency_penalty": 0.5 }
]
}
}
```
### Nested Routing
LangDB AI allows nesting of routing strategies, enabling combinations like fallback within script-based selection. This flexibility helps refine model selection based on dynamic business needs.
```json
{
"model": "router/dynamic",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the formula of a square plot?" }
],
"router": {
"type": "fallback",
"targets": [
{
"model": "router/dynamic",
"router": {
"name": "cheapest_script_execution",
"type": "script",
"script": "const route = ({ models }) => models \
.filter(m => m.inference_provider.provider === 'bedrock' && m.type === 'completions') \
.sort((a, b) => a.price.per_input_token - b.price.per_input_token)[0]?.model;"
}
},
{
"model": "router/dynamic",
"router": {
"name": "fastest",
"type": "optimized",
"metric": "ttft",
"targets": [
{ "model": "gpt-3.5-turbo", "temperature": 0.8, "max_tokens": 400, "frequency_penalty": 0.5 },
{ "model": "gpt-4o-mini", "temperature": 0.9, "max_tokens": 500, "top_p": 0.9 }
]
}
},
{ "model": "deepseek/deepseek-chat", "temperature": 0.7, "max_tokens": 300, "frequency_penalty": 1 }
]
},
"stream": false
}
```
---
## Script Routing
Along with UI and JSON, you can also create a router logic using JavaScript. This allows for dynamic and flexible routing decisions based on available models, request parameters, and system metrics. The scripting router executes JavaScript code in a secure runtime environment and determines the appropriate model to use for processing the request.
### Script-Based Routing
LangDB AI allows executing custom JavaScript scripts to determine the best model dynamically. The script runs at request time and evaluates multiple parameters, including pricing, latency, and model availability.
```json
{
"model": "router/dynamic",
"router": {
"name": "cheapest_script_execution",
"type": "script",
"script": "const route = ({ body, headers, models, metrics }) => { \
let cheapest_open_ai_model = models \
.filter(m => m.inference_provider.provider === 'bedrock' && m.type === 'completions') \
.sort((a, b) => a.price.per_input_token - b.price.per_input_token)[0]; \
return { model: cheapest_open_ai_model.model, temperature: 0.7, max_tokens: 300, top_p: 0.95 }; \
};"
}
}
```
## How It Works
A script-based routing configuration requires defining a JavaScript function `route`. This function takes in:
* `request`: The input request object
* `headers`: HTTP headers
* `models`: The available models
* `metrics`: System metrics
The function should return an object specifying the selected model and parameters for processing the request.
```javascript
const route = ({ request, headers, models, metrics }) => {
let cheapest_open_ai_model = models
.filter(m => m.inference_provider.provider === 'bedrock' && m.type === 'completions')
.sort((a, b) => a.price.per_input_token - b.price.per_input_token)[0];
return {
model: cheapest_open_ai_model.model,
temperature: 0.7,
max_tokens: 300,
top_p: 0.95
};
};
```
## Components
### Models
The `models` struct contains objects with the following fields:
* `model`: The name of the model
* `model_provider`: The provider of the model (e.g., OpenAI, Anthropic)
* `inference_provider`: Information about the inference provider
* `price`: Pricing details per token, including:
* `per_input_token`: Cost per input token
* `per_output_token`: Cost per output token
* `valid_from`: Date when pricing becomes valid
* `input_formats` & `output_formats`: Supported input and output formats
* `capabilities`: Model capabilities
* `type`: Model type (e.g., `completions`, `embeddings`, `image_generation`)
* `limits`: Model-specific limitations
* `description`: A description of the model
* `parameters`: Additional model parameters
#### Example
```json
{
"model": "gpt-4",
"model_provider": "OpenAI",
"inference_provider": { "provider": "bedrock" },
"price": {
"per_input_token": 0.00002,
"per_output_token": 0.00004,
"valid_from": "2024-01-01"
},
"input_formats": ["text"],
"output_formats": ["text"],
"capabilities": ["tools"],
"type": "completions",
"limits": { "max_context_size": 4096 },
"description": "OpenAI's GPT-4 model",
"parameters": { "temperature": 0.7 }
}
```
### Metrics
The `metrics` object provides system performance statistics that can influence routing decisions. Available fields include:
* `requests`: Total number of requests handled
* `input_tokens`: Number of input tokens processed
* `output_tokens`: Number of output tokens generated
* `total_tokens`: Total tokens used (input + output)
* `latency`: Response latency in milliseconds
* `ttft`: Time to first token (TTFT)
* `llm_usage`: Usage metrics
* `tps`: Transactions per second
* `error_rate`: Error rate percentage
```json
{
"requests": 1023,
"input_tokens": 45000,
"output_tokens": 90000,
"total_tokens": 135000,
"latency": 120,
"ttft": 30,
"llm_usage": 0.8,
"tps": 2.5,
"error_rate": 0.01
}
```
## Writing Custom Routing Scripts
Custom routing strategies can be implemented, such as selecting the fastest model based on latency:
```javascript
const route = ({ request, headers, models, metrics }) => {
let fastest_model = models
.filter(m => m.type === 'completions')
.sort((a, b) => metrics[m.model]?.latency - metrics[b.model]?.latency)[0];
return {
model: fastest_model.model,
temperature: 0.5,
max_tokens: 500,
top_p: 0.9
};
};
```
Or selecting a model with the lowest error rate:
```javascript
const route = ({ request, headers, models, metrics }) => {
let reliable_model = models
.filter(m => metrics[m.model]?.error_rate !== undefined)
.sort((a, b) => metrics[a.model]?.error_rate - metrics[b.model]?.error_rate)[0];
return {
model: reliable_model.model,
temperature: 0.6,
max_tokens: 400,
top_p: 0.8
};
};
```
---
## Tracing
LangDB Gateway provides detailed tracing to monitor, debug, and optimize LLM workflows.
Below is an example of a trace visualization from the dashboard, showcasing a detailed breakdown of the request stages:

In this example trace you’ll find:
* **Overview Metrics**
* **Cost**: Total spend for this request (e.g. $0.034).
* **Tokens**: Input (5,774) vs. output (1,395).
* **Duration**: Total end-to-end latency (29.52 s).
* **Timeline Breakdown**\
A parallel-track timeline showing each step—from moderation and relevance scoring to model inference and final reply.
* **Model Invocations**\*\*\
Every call to `gpt-4o-mini`, `gpt-4o`, etc., is plotted with precise start times and durations.
* **Agent Hand-offs**\
Transitions between your agents (e.g. search → booking → reply) are highlighted with custom labels like `transfer_to_reply_agent`.
* **Tool Integrations**\
External tools (e.g. `booking_tool`, `travel_tool`, `python_repl_tool`) appear inline with their execution times—so you can spot slow or failed runs immediately.
* **Guardrails**\
Rules like **Min Word Count** and **Travel Relevance** enforce domain-specific constraints and appear in the trace.
With this level of visibility you can quickly pinpoint bottlenecks, understand cost drivers, and ensure your multi-agent pipelines run smoothly.
---
## Usage
**Monitoring** complements tracing by providing aggregate insights into the usage of LLM workflows.
## Limits
LangDB enforces limits to ensure fair usage and cost management while allowing users to configure these limits as needed. Limits are categorized into:
1. **Daily Limits:** Maximum usage per day, e.g., $10 in the Starter Tier.
2. **Monthly Limits:** Total usage allowed in a month, e.g., $100.
**Total Limits:** Cumulative limit over the project’s duration, e.g., $500.
### Best Practices
* Monitor usage regularly to avoid overages.
* Plan limits based on project needs and anticipated workloads.
* Upgrade tiers if usage consistently approaches limits.
Setting limits not only helps you stay within budget but also provides the flexibility to scale your usage as needed, ensuring your projects run smoothly and efficiently.
## Usage APIs
### [Get total usage](../api-reference/get-total-usage)
Retrieves the total usage statistics for your project for a timeframe.
```bash
curl --location 'https://api.us-east-1.langdb.ai/usage/total' \
--header 'x-project-id: langdbProjectID' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer langDBAPIKey' \
--data '{"start_time_us": 1693062345678,
"end_time_us": 1695092345678}'
```
Example Response:
```json
{
"total": {
"total_input_tokens": 4181386,
"total_output_tokens": 206547,
"total_cost": 11.890438685999994
},
"period_start": 1737504000000000,
"period_end": 1740131013885000
}
```
### [Get usage by model](../api-reference/get-usage-by-model)
Fetches timeseries usage statistics per model, allowing users to analyze the distribution of LLM usage.
```bash
curl --location 'https://api.us-east-1.langdb.ai/usage/models' \
--header 'x-project-id: langdbProjectID' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer langDBAPIKey' \
--data '{"start_time_us": 1693062345678, "end_time_us": 1695092345678,
"min_unit": "hour"} '
```
Example Response:
```json
{
"models": [
{
"hour": "2025-02-14 08:00:00",
"provider": "openai",
"model_name": "gpt-4o-mini",
"total_input_tokens": 13408,
"total_output_tokens": 2169,
"total_cost": 0.0039751199999999995
},
{
"hour": "2025-02-13 08:00:00",
"provider": "openai",
"model_name": "gpt-4o-mini",
"total_input_tokens": 55612,
"total_output_tokens": 786,
"total_cost": 0.01057608
}
],
"period_start": 1737504000000000,
"period_end": 1740130915098000
}
```
### Filtering By Users
As discussed in User Tracking, we can use filters to retrieve insights based on `id`, `name`, or `tags`.
**Available Filters:**
* **`user_id`**: Filter data for a specific user by their unique ID.
* **`user_name`**: Retrieve usage based on the user’s name.
* **`user_tags`**: Filter by tags associated with a user (e.g., "websearch", "support").
```bash
curl -L \
--request POST \
--url 'https://api.us-east-1.langdb.ai/usage/models' \
--header 'Authorization: Bearer langDBAPIKey' \
--header 'X-Project-Id: langDBProjectID' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "123",
"user_name": "mrunmay",
"user_tags": ["websearch", "testings"]
}'
```
Example response:
```json
{
"models": [
{
"day": "2025-02-21 10:00:00",
"provider": "openai",
"model_name": "gpt-4o-mini",
"total_input_tokens": 1112,
"total_output_tokens": 130,
"total_cost": 0.00029376
},
{
"day": "2025-02-21 14:00:00",
"provider": "openai",
"model_name": "gpt-4o-mini",
"total_input_tokens": 3317,
"total_output_tokens": 328,
"total_cost": 0.00083322
}
],
"period_start": 1737556513673410,
"period_end": 1740148513673410
}
```
---
## User Roles
LangDB provides role-based access control to manage users efficiently within an organization. There are three primary roles: **Admin, Developer, and Billing**.
Each role has specific permissions and responsibilities, ensuring a structured and secure environment for managing teams.
## **Admin**
Admins have the highest level of control within LangDB. They can:
* Invite and manage users
* Assign and modify roles for team members
* Manage cost groups and usage tracking
* Access billing details and payment settings
* Configure organizational settings
* Configure project model access restrictions
* Configure project user access restrictions
**Best for:** Organization owners, team leads, or IT administrators managing team access and billing.
## **Developer**
Developers focus on working with APIs and integrating LLMs. They have the following permissions:
* Access and use LangDB APIs
* Deploy and test applications using LangDB’s AI Gateway
* View and monitor API usage and performance
**Best for:** Software developers, data scientists, and AI engineers working on LLM integrations.
## Billing
Billing users have access to financial and cost-related features. Their permissions include:
* Managing top-ups and subscriptions
* Monitoring usage costs and optimizing expenses
**Best for:** Finance teams, accounting personnel, and cost management administrators.
***
## Role Management
Admins can assign roles to users when inviting them to the organization. Role changes can also be made later through the user management panel.
### **Key Points:**
* Users can have multiple roles (e.g., both Developer and Billing).-
* Only Admins can assign or update roles.
* Billing users cannot modify API access but can track and manage costs.
* **Role Management is only available in Professional, Business, and Enterprise tiers.**
---
## Quick Start
The **LangDB** **AI Gateway** allows you to connect with multiple Large Language Models (LLMs) instantly, without any setup.
## Hosted
The hosted version of LangDB provides a full-featured and managed AI gateway that offers instant access to 250+ LLMs with enterprise-ready features.
## Step 1: Account Creation
Sign up on [LangDB](https://app.langdb.ai/) to start using the Hosted Gateway
## Step 2: Make your First Request
Test a chat window with two different models to see dynamic routing in action.
## Step 3: Checkout Samples section for Template Code
Use ready-made templates to integrate LangDB into your project effortlessly.
## Step 4: Analytics Section
Monitor usage, costs, and performance insights through the LangDB analytics dashboard.
## Open Source
A self-hosted option for organizations that require complete control over their AI infrastructure. LangDB AI Gateway is available as an open-source repository that you can run locally. See the [Self-Hosted documentation](/self-hosted) for more details.
---
## User Tracking
LangDB AI enables user tracking to collect analytics and monitor usage patterns efficiently. By associating metadata with requests, developers can analyze interactions, optimize performance, and enhance user experience.
## Example: Chatbot Analytics with User Tracking
For a chatbot service handling multiple users, tracking enables:
* **Recognizing returning users**: Maintain conversation continuity.
* **Tracking usage trends**: Identify common queries to improve responses.
* **User segmentation**: Categorize users using tags (e.g., "websearch", "support").
* **Analytics**: Identify heavy users and allocate resources efficiently.
```python
curl 'https://api.us-east-1.langdb.ai/v1/chat/completions' \
-H 'authorization: Bearer LangDBApiKey' \
-H 'Content-Type: application/json' \
-d '{
"model": "openai/gpt-4o-mini",
"stream": true,
"messages": [
{
"role": "user",
"content": "Def bubbleSort()"
}
],
"extra": {
"user": {
"id": "7",
"name": "mrunmay",
"tags": ["coding", "software"]
}
}
}'
```
### User Tracking Fields
* **`extra.user.id`**: Unique user identifier.
* **`extra.user.name`**: User alias.
* **`extra.user.tags`**: Custom tags to classify users (e.g., "coding", "software").
## Fetching User Analytics & Usage Data
Once users are tracked, analytics and usage APIs can be used to retrieve insights based on `id`, `name`, or `tags`.
Checkout [Usage](../features/usage.md#usage-apis) and [Analytics](../features/analytics.md#analytics-apis) section for more details.
Example:
```bash
curl -L \
--request POST \
--url 'https://api.us-east-1.langdb.ai/analytics/summary' \
--header 'Authorization: Bearer langDBAPIKey' \
--header 'X-Project-Id: langDBProjectID' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "7",
"user_name": "mrunmay",
"user_tags": ["software", "code"]
}'
```
Example response:
```json
{
"summary": [
{
"total_cost": 0.00030366,
"total_requests": 1,
"total_duration": 6240.888,
"avg_duration": 6240.9,
"duration": 6240.9,
"duration_p99": 6240.9,
"duration_p95": 6240.9,
"duration_p90": 6240.9,
"duration_p50": 6240.9,
"total_input_tokens": 1139,
"total_output_tokens": 137,
"avg_ttft": 6240.9,
"ttft": 6240.9,
"ttft_p99": 6240.9,
"ttft_p95": 6240.9,
"ttft_p90": 6240.9,
"ttft_p50": 6240.9,
"tps": 204.46,
"tps_p99": 204.46,
"tps_p95": 204.46,
"tps_p90": 204.46,
"tps_p50": 204.46,
"tpot": 0.05,
"tpot_p99": 0.05,
"tpot_p95": 0.05,
"tpot_p90": 0.05,
"tpot_p50": 0.05,
"error_rate": 0.0,
"error_request_count": 0
}
],
"start_time_us": 1737547895565066,
"end_time_us": 1740139895565066
}
```
---
## Using Parameters
LangDB AI Gateway supports every LLM parameter like temperature, max\_tokens, stop sequences, logit\_bias, and more.
## API Usage:
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
```python
from openai import OpenAI
response = client.chat.completions.create(
model="gpt-4o", # Change Model
messages=[
{"role": "user", "content": "What are the earnings of Apple in 2022?"},
],
temperature=0.7, # temperature parameter
max_tokens=150, # max_tokens parameter
stream=True # stream parameter
)
```
```typescript
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages,
temperature: 0.7, // temperature parameter
max_tokens: 150, // max_tokens parameter
logit_bias: { '50256': -100 }, // logit_bias parameter
stream: true, // stream parameter
});
```
## UI
You can also use the UI to test various parameters and getting code snippet;
### Playground
Use the Playground to tweak parameters in real time via the [Virtual Model config](../concepts/virtual-models/) and send test requests instantly.
### Samples
Explore ready-made code snippets complete with preconfigured parameters—copy, paste, and customize to fit your needs.
---
## Working with Agent Frameworks
LangDB integrates seamlessly with a variety of agent libraries to provide out-of-the-box tracing, observability, and cost insights. By simply initializing the LangDB client adapter for your agent framework, LangDB **monkey‑patches** the underlying client to inject tracing hooks—no further code changes required.
### Prerequisites
* **LangDB Core** installed:
```
pip install 'pylangdb'
```
* **Optional feature flags** (for framework-specific tracing):
```
pip install 'pylangdb[]'
# e.g. pylangdb[adk], pylangdb[openai_agents]
```
* **Environment Variables** set:
```bash
export LANGDB_API_KEY="xxxxx"
export LANGDB_PROJECT_ID="xxxxx"
```
## Quick Start
Import and initialize once, before creating or running any agents:
```python
from pylangdb. import init
# Monkey‑patch the client for tracing
init()
# ...then your existing agent setup...
```
> **Monkey‑patching note:** The `init()` call wraps key client methods at runtime to capture telemetry. Ensure it runs as early as possible.
**GitHub Repo:** [https://github.com/langdb/pylangdb](https://github.com/langdb/pylangdb)
### Example: Google ADK
```bash
pip install 'pylangdb[adk]'
```
```python
from pylangdb.adk import init
init()
from google.adk.agents import Agent
# (rest of your Google ADK agent code)
```
This is an example of complete end-to-end trace using Google ADK and LangDB.
LangDB’s ADK adapter captures request/response metadata, token usage, and latenc metrics automatically. During initialization it discovers and wraps all agents and sub‑agents in subfolders, linking their sessions for full end‑to‑end tracing across your workflow.
## Supported Frameworks
Google ADK
OpenAI Agents SDK
LangGraph
Agno
CrewAI
## Further Documentation
For full documentation including client capabilities, configuration, and detailed examples, checkout [Python SDK documentation](../../python-sdk/getting-started.md) and [Github](https://github.com/langdb/pylangdb).
---
## Working with Agno
LangDB’s Agno integration provides end-to-end tracing for your Agno agent pipelines.
Checkout: https://app.langdb.ai/sharing/threads/8a44dccc-c679-4fc3-9555-a07de103d637
## Installation
Install the LangDB client with Agno feature flag:
```bash
pip install 'pylangdb[agno]'
```
## Quick Start
### Export Environment Variables
Set your LangDB credentials:
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
### Initialize Tracing
Import and run the initialize **before** configuring your Agno Code:
```python
from pylangdb.agno import init
# Initialise LangDB
init()
```
### Configure your Agno code
```python
import os
from pylangdb.agno import init
init()
from agno.agent import Agent
from agno.tools.duckduckgo import DuckDuckGoTools
from agno.models.langdb import LangDB
# Configure LangDB-backed model
langdb_model = LangDB(
id="openai/gpt-4",
api_key=os.getenv("LANGDB_API_KEY"),
project_id=os.getenv("LANGDB_PROJECT_ID"),
)
# Create and run your agent
agent = Agent(
name="Web Agent",
role="Search the web for information",
model=langdb_model,
tools=[DuckDuckGoTools()],
instructions="Answer questions using web search",
)
response = agent.run("What is LangDB?")
print(response)
```
All Agno interactions from invocation through tool calls to final output are traced with LangDB.
## Complete Agno Example
Here is a full example based on Web Search Agno Multi Agent Team.
### Example code
> Check out the full sample on GitHub: [https://github.com/langdb/langdb-samples/tree/main/examples/agno/agno-basic](https://github.com/langdb/langdb-samples/tree/main/examples/agno/agno-basic)
### Setup Environment
```
pip install agno 'pylangdb[agno]' duckduckgo-search
```
### Export Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
### main.py
```python
import os
from textwrap import dedent
# Initialize LangDB tracing and import model
from pylangdb.agno import init
init()
from agno.models.langdb import LangDB
# Import Agno agent components
from agno.agent import Agent
from agno.tools.duckduckgo import DuckDuckGoTools
# Function to create a LangDB model with selectable model name
def create_langdb_model(model_name="openai/gpt-4.1"):
return LangDB(
id=model_name,
api_key=os.getenv("LANGDB_API_KEY"),
project_id=os.getenv("LANGDB_PROJECT_ID"),
)
web_agent = Agent(
name="Web Agent",
role="Search the web for comprehensive information and current data",
model=create_langdb_model("openai/gpt-4.1"),
tools=[DuckDuckGoTools()],
instructions="Always use web search tools to find current and accurate information. Search for multiple aspects of the topic to gather comprehensive data.",
show_tool_calls=True,
markdown=True,
)
writer_agent = Agent(
name="Writer Agent",
role="Write comprehensive article on the provided topic",
model=create_langdb_model("anthropic/claude-3.7-sonnet"),
instructions="Use outlines to write articles",
show_tool_calls=True,
markdown=True,
)
agent_team = Agent(
name="Research Team",
team=[web_agent, writer_agent],
model=create_langdb_model("gemini/gemini-2.0-flash"),
instructions=dedent("""\
You are the coordinator of a research team with two specialists:
1. Web Agent: Has DuckDuckGo search tools and must be used for ALL research tasks
2. Writer Agent: Specializes in creating comprehensive articles
WORKFLOW:
1. ALWAYS delegate research tasks to the Web Agent first
2. The Web Agent MUST use web search tools to gather current information
3. Then delegate writing tasks to the Writer Agent using the research findings
4. Ensure comprehensive coverage of the topic through multiple searches
IMPORTANT: Never attempt to answer without first having the Web Agent conduct searches.
"""),
show_tool_calls=True,
markdown=True,
)
agent_team.print_response(
"I need a comprehensive article about the Eiffel Tower. "
"Please have the Web Agent search for current information about its history, architectural significance, and cultural impact. "
"Then have the Writer Agent create a detailed article based on the research findings.",
stream=True
)
```
### Running your Agent
Navigate to the parent directory of your agent project and use one of the following commands:
```bash
python main.py
```
### Traces on LangDB
When you run queries against your agent, LangDB automatically captures detailed traces of all agent interactions:
## Next Steps: Advanced Agno Integration
This guide covered the basics of integrating LangDB with Agno using a Web Search agent example. For more complex scenarios and advanced use cases, check out our comprehensive resources in [Guides Section](https://app.gitbook.com/s/jwXvc8GvLP6kUvOASaLP/).
---
## Working with CrewAI
LangDB makes it effortless to trace CrewAI workflows end-to-end. With a single `init()` call, all agent interactions, task executions, and LLM calls are captured.
Checkout: https://app.langdb.ai/sharing/threads/3becbfed-a1be-ae84-ea3c-4942867a3e22
## Installation
Install the LangDB client with LangChain feature flag:
```bash
pip install 'pylangdb[crewai]'
```
## Quick Start
### Export Environment Variables
Set your LangDB credentials:
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
### Initialize Tracing
Import and run the initialize **before** configuring your CrewAI Code:
```python
from pylangdb.crewai import init
# Initialise LangDB
init()
```
### Configure your CrewAI code
```python
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, LLM
# Configure LLM with LangDB headers
llm = LLM(
model="openai/gpt-4o", # Use LiteLLM Like Model Names
api_key=os.getenv("LANGDB_API_KEY"),
base_url=os.getenv("LANGDB_API_BASE_URL"),
extra_headers={"x-project-id": os.getenv("LANGDB_PROJECT_ID")}
)
# Define agents and tasks as usual
researcher = Agent(
role="researcher",
goal="Research topic thoroughly",
backstory="You are an expert researcher",
llm=llm,
verbose=True
)
task = Task(description="Research the given topic", agent=researcher)
crew = Crew(agents=[researcher], tasks=[task])
# Kick off the workflow
result = crew.kickoff()
print(result)
```
All CrewAI calls—agent initialization, task execution, and model responses—are automatically linked.
## Complete CrewAI example
Here is a full example based on CrewAI report writing agent.
### Example code
> Check out the full sample on GitHub: [https://github.com/langdb/langdb-samples/tree/main/examples/crewai/crewai-tracing](https://github.com/langdb/langdb-samples/tree/main/examples/crewai/crewai-tracing)
### Setup Evironment
```bash
pip install crewai 'pylangdb[crewai]' crewai_tools setuptools python-dotenv
```
### Export Environment Variables
You also need to get API Key from [Serper.dev](https://serper.dev/)
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
export LANGDB_API_BASE_URL='https://api.us-east-1.langdb.ai'
```
### main.py
```python
#!/usr/bin/env python3
import os
import sys
from pylangdb.crewai import init
init()
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import SerperDevTool
load_dotenv()
def create_llm(model):
return LLM(
model=model,
api_key=os.environ.get("LANGDB_API_KEY"),
base_url=os.environ.get("LANGDB_API_BASE_URL"),
extra_headers={"x-project-id": os.environ.get("LANGDB_PROJECT_ID")}
)
class ResearchPlanningCrew:
def researcher(self) -> Agent:
return Agent(
role="Research Specialist",
goal="Research topics thoroughly",
backstory="Expert researcher with skills in finding information",
tools=[SerperDevTool()],
llm=create_llm("openai/gpt-4o"),
verbose=True
)
def planner(self) -> Agent:
return Agent(
role="Strategic Planner",
goal="Create actionable plans based on research",
backstory="Strategic planner who breaks down complex challenges",
reasoning=True,
max_reasoning_attempts=3,
llm=create_llm("openai/anthropic/claude-3.7-sonnet"),
verbose=True
)
def research_task(self) -> Task:
return Task(
description="Research the topic thoroughly and compile information",
agent=self.researcher(),
expected_output="Comprehensive research report"
)
def planning_task(self) -> Task:
return Task(
description="Create a strategic plan based on research",
agent=self.planner(),
expected_output="Strategic execution plan with phases and goals",
context=[self.research_task()]
)
def crew(self) -> Crew:
return Crew(
agents=[self.researcher(), self.planner()],
tasks=[self.research_task(), self.planning_task()],
verbose=True,
process=Process.sequential
)
def main():
topic = sys.argv[1] if len(sys.argv) > 1 else "Artificial Intelligence in Healthcare"
crew_instance = ResearchPlanningCrew()
# Update task descriptions with topic
crew_instance.research_task().description = f"Research {topic} thoroughly and compile information"
crew_instance.planning_task().description = f"Create a strategic plan for {topic} based on research"
result = crew_instance.crew().kickoff()
print(result)
if __name__ == "__main__":
main()
```
### Running your Agent
Navigate to the parent directory of your agent project and use one of the following commands:
```bash
python main.py
```
### Traces on LangDB:
When you run queries against your agent, LangDB automatically captures detailed traces of all agent interactions:
## Next Steps: Advanced CrewAI Integration
This guide covered the basics of integrating LangDB with CrewAI using a Research and Planning agent example. For more complex scenarios and advanced use cases, check out our comprehensive resources in [Guides Section](https://app.gitbook.com/s/jwXvc8GvLP6kUvOASaLP/).
---
## Working with Google ADK
LangDB’s Google ADK integration provides end-to-end tracing for your ADK agent pipelines.
Checkout: https://app.langdb.ai/sharing/threads/8425e068-77de-4f41-8aa9-d1111fc7d2b7
## Installation
Enable end-to-end tracing for your Google ADK agents by installing the `pylangdb` client with the ADK feature flag:
```bash
pip install 'pylangdb[adk]'
```
## Quick Start
Set your environment variables **before** initializing running the script:
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
Initialize LangDB **before** creating or running any ADK agents:
```python
from pylangdb.adk import init
# Initialise LangDB
init()
# Then proceed with your normal ADK setup:
from google.adk.agents import Agent
# ...define and run agents...
```
Once initialized, LangDB automatically discovers all agents and sub-agents (including nested folders), wraps their key methods at runtime, and links sessions for full end-to-end tracing across your workflow as well.
## Complete Google ADK Python Example
Here's a full example of a Google ADK agent implementation that you can instrument with LangDB. This sample is based on the official [Google ADK Quickstart](https://google.github.io/adk-docs/get-started/quickstart/).
### Example code
> Check out the full sample on GitHub: [https://github.com/langdb/langdb-samples/tree/main/examples/google-adk/multi-tool-agent](https://github.com/langdb/langdb-samples/tree/main/examples/google-adk/multi-tool-agent)
### Setup Environment
```bash
pip install google-adk litellm 'pylangdb[adk]'
```
### Project Structure
Create the following project structure:
```
parent_folder/
└── multi_tool_agent/
├── __init__.py
├── agent.py
└── .env
```
### **init**.py
Create an `__init__.py` file in the `multi_tool_agent` folder:
```python
from . import agent
```
### .env
Create `.env` file for your secrets
```bash
LANGDB_API_KEY=""
LANGDB_PROJECT_ID=""
```
### agent.py
Create an `agent.py` file with the following code:
```python
# First initialize LangDB before defining any agents
from pylangdb.adk import init
init()
import datetime
from zoneinfo import ZoneInfo
from google.adk.agents import Agent
def get_weather(city: str) -> dict:
if city.lower() != "new york":
return {"status": "error", "error_message": f"Weather information for '{city}' is not available."}
return {"status": "success", "report": "The weather in New York is sunny with a temperature of 25 degrees Celsius (77 degrees Fahrenheit)."}
def get_current_time(city: str) -> dict:
if city.lower() != "new york":
return {"status": "error", "error_message": f"Sorry, I don't have timezone information for {city}."}
tz = ZoneInfo("America/New_York")
now = datetime.datetime.now(tz)
return {"status": "success", "report": f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}'}
root_agent = Agent(
name="weather_time_agent",
model="gemini-2.0-flash",
description=("Agent to answer questions about the time and weather in a city." ),
instruction=("You are a helpful agent who can answer user questions about the time and weather in a city."),
tools=[get_weather, get_current_time],
)
```
### Running Your Agent
Navigate to the parent directory of your agent project and use the following commands:
```bash
adk web
```
Open the URL provided (usually http://localhost:8000) in your browser and select "multi\_tool\_agent" from the dropdown menu.
Once your agent is running, try these example queries to test its functionality:
```
Whats the weather in New York?
```
These queries will trigger the agent to use the functions we defined and provide responses based on the our agent workflow.
### Traces on LangDB
When you run queries against your ADK agent, LangDB automatically captures detailed traces of all agent interactions:
## Next Steps: Advanced Google ADK Integration
This guide covered the basics of integrating LangDB with Google ADK using a simple weather and time agent example. For more complex scenarios and advanced use cases, check out our comprehensive resources in [Guides Section](https://app.gitbook.com/s/jwXvc8GvLP6kUvOASaLP/).
---
## Working with LangGraph
LangDB provides seamless tracing and observability for LangChain-based applications.
Checkout: https://app.langdb.ai/sharing/threads/bfced28f-5966-4491-867e-fd3875fe3282
## Installation
Install the LangDB client with LangChain support:
```bash
pip install 'pylangdb[langchain]'
```
## Quick Start
### Export Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
export LANGDB_API_BASE_URL='https://api.us-east-1.langdb.ai'
```
### Initialize LangDB
Import and run the initialize **before** configuring your LangChain/LangGraph:
```python
from pylangdb.langchain import init
# Initialise LangDB
init()
```
### Define your Agent
```python
# Your existing LangChain code works with proper configuration
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
import os
api_base = "https://api.us-east-1.langdb.ai"
api_key = os.getenv("LANGDB_API_KEY")
project_id = os.getenv("LANGDB_PROJECT_ID")
# Default headers for API requests
default_headers: dict[str, str] = {
"x-project-id": project_id
}
# Initialize OpenAI LLM with LangDB configuratio
llm = ChatOpenAI(
model_name="gpt-4o",
temperature=0.3,
openai_api_base=api_base,
openai_api_key=api_key,
default_headers=default_headers,
)
result = llm.invoke([HumanMessage(content="Hello, LangDB!")])
```
Once LangDB is initialized, all calls to `llm`, intermediate steps, tool executions, and nested chains are automatically traced and linked under a single session.
## Complete LangGraph Agent Example
Here is a full LangGraph example based on ReAct Agent which uses LangDB Tracing.
### Example code
> Check out the full sample on GitHub: [https://github.com/langdb/langdb-samples/tree/main/examples/langchain/langgraph-tracing](https://github.com/langdb/langdb-samples/tree/main/examples/langchain/langgraph-tracing)
### Setup Environment
Install the libraries using pip
```bash
pip install langgraph 'pylangdb[langchain]' langchain_openai geopy
```
### Export Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
export LANGDB_API_BASE_URL='https://api.us-east-1.langdb.ai'
```
### main.py
```python
# Initialize LangDB tracing
from pylangdb.langchain import init
init()
import os
from typing import Annotated, Sequence, TypedDict
from datetime import datetime
# Import required libraries
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, ToolMessage
from langchain_core.tools import tool
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from geopy.geocoders import Nominatim
from pydantic import BaseModel, Field
import requests
# Initialize the model
def create_model():
"""Create and return the ChatOpenAI model."""
api_base = os.getenv("LANGDB_API_BASE_URL")
api_key = os.getenv("LANGDB_API_KEY")
project_id = os.getenv("LANGDB_PROJECT_ID")
default_headers = {
"x-project-id": project_id,
}
llm = ChatOpenAI(
model_name='openai/gpt-4o', # Choose any model from LangDB
temperature=0.3,
openai_api_base=api_base,
openai_api_key=api_key,
default_headers=default_headers
)
return llm
# Define the agent state
class AgentState(TypedDict):
"""The state of the agent."""
messages: Annotated[Sequence[BaseMessage], add_messages]
number_of_steps: int
# Define the weather tool
class SearchInput(BaseModel):
location: str = Field(description="The city and state, e.g., San Francisco")
date: str = Field(description="The forecasting date in format YYYY-MM-DD")
@tool("get_weather_forecast", args_schema=SearchInput, return_direct=True)
def get_weather_forecast(location: str, date: str) -> dict:
"""
Retrieves the weather using Open-Meteo API for a given location (city) and a date (yyyy-mm-dd).
Returns a dictionary with the time and temperature for each hour.
"""
geolocator = Nominatim(user_agent="weather-app")
location = geolocator.geocode(location)
if not location:
return {"error": "Location not found"}
try:
response = requests.get(
f"https://api.open-meteo.com/v1/forecast?"
f"latitude={location.latitude}&"
f"longitude={location.longitude}&"
"hourly=temperature_2m&"
f"start_date={date}&end_date={date}",
timeout=10
)
response.raise_for_status()
data = response.json()
return {
time: f"{temp}°C"
for time, temp in zip(
data["hourly"]["time"],
data["hourly"]["temperature_2m"]
)
}
except Exception as e:
return {"error": f"Failed to fetch weather data: {str(e)}"}
# Define the nodes
def call_model(state: AgentState) -> dict:
"""Call the model with the current state and return the response."""
model = create_model()
model.bind_tools([get_weather_forecast]
messages = state["messages"]
response = model.invoke(messages)
return {"messages": [response], "number_of_steps": state["number_of_steps"] + 1}
def route_to_tool(state: AgentState) -> str:
"""Determine the next step based on the model's response."""
messages = state["messages"]
last_message = messages[-1]
if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
return "call_tool"
return END
# Create the graph
def create_agent():
"""Create and return the LangGraph agent."""
# Create the graph
workflow = StateGraph(AgentState)
workflow.add_node("call_model", call_model)
workflow.add_node("call_tool", ToolNode([get_weather_forecast]))
workflow.set_entry_point("call_model")
workflow.add_conditional_edges(
"call_model",
route_to_tool,
{
"call_tool": "call_tool",
END: END
}
)
workflow.add_edge("call_tool", "call_model")
return workflow.compile()
def main():
agent = create_agent()
query = f"What's the weather in Paris today? Today is {datetime.now().strftime('%Y-%m-%d')}."
initial_state = {
"messages": [HumanMessage(content=query)],
"number_of_steps": 0
}
print(f"Query: {query}")
print("\nRunning agent...\n")
for output in agent.stream(initial_state):
for key, value in output.items():
if key == "__end__":
continue
print(f"\n--- {key.upper()} ---")
if key == "messages":
for msg in value:
if hasattr(msg, 'content'):
print(f"{msg.type}: {msg.content}")
if hasattr(msg, 'tool_calls') and msg.tool_calls:
print(f"Tool Calls: {msg.tool_calls}")
else:
print(value)
if __name__ == "__main__":
main()
```
### Running your Agent
Navigate to the parent directory of your agent project and use one of the following commands:
```bash
python main.py
```
#### Output
```bash
--- CALL_MODEL ---
{'messages': [AIMessage(content="The weather in Paris on July 1, 2025, is as follows:\n\n- 00:00: 28.1°C\n- 01:00: 27.0°C\n- 02:00: 26.3°C\n- 03:00: 25.7°C\n- 04:00: 25.1°C\n- 05:00: 24.9°C\n- 06:00: 25.8°C\n- 07:00: 27.6°C\n- 08:00: 29.6°C\n- 09:00: 31.7°C\n- 10:00: 33.7°C\n- 11:00: 35.1°C\n- 12:00: 36.3°C\n- 13:00: 37.3°C\n- 14:00: 38.6°C\n- 15:00: 37.9°C\n- 16:00: 38.1°C\n- 17:00: 37.8°C\n- 18:00: 37.3°C\n- 19:00: 35.3°C\n- 20:00: 33.2°C\n- 21:00: 30.8°C\n- 22:00: 28.7°C\n- 23:00: 27.3°C\n\nIt looks like it's going to be a hot day in Paris!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 319, 'prompt_tokens': 585, 'total_tokens': 904, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'cost': 0.005582999999999999}, 'model_name': 'gpt-4o', 'system_fingerprint': None, 'id': '3bbde343-79e3-4d8f-bd97-b07179ee92c0', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--4fd3896d-1fbd-4c91-9c21-bd6cf3d2949e-0', usage_metadata={'input_tokens': 585, 'output_tokens': 319, 'total_tokens': 904, 'input_token_details': {}, 'output_token_details': {}})], 'number_of_steps': 2}
```
### Traces on LangDB
When you run queries against your agent, LangDB automatically captures detailed traces of all agent interactions:
## Next Steps: Advanced LangGraph Integration
This guide covered the basics of integrating LangDB with LangGraph using a ReAcT agent example. For more complex scenarios and advanced use cases, check out our comprehensive resources in [Guides Section](https://app.gitbook.com/s/jwXvc8GvLP6kUvOASaLP/).
---
## Working with OpenAI Agents SDK
LangDB helps you add full tracing and observability to your OpenAI Agents SDK workflows—without changing your core logic. With a one-line initialization, LangDB captures model calls, tool invocations, and intermediate steps, giving you a complete view of how your agent operates.
Checkout: https://app.langdb.ai/sharing/threads/53b87631-de7f-431a-a049-48556f899b4d
## Installation
Enable end-to-end tracing for your OpenAI Agents SDK agents by installing the `pylangdb` client with the `openai` feature flag:
```bash
pip install 'pylangdb[openai]'
```
## Quick Start
### Export Environment Variables
Set your LangDB credentials:
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
### Initialize Tracing
Import and run the initialize **before** configuring your OpenAI client:
```python
from pylangdb.openai import init
# Initialise LangDB
init()
```
### Configure OpenAI Client and Agent Runner
```python
# Agent SDK imports
from agents import (
Agent,
Runner,
set_default_openai_client,
RunConfig,
ModelProvider,
Model,
OpenAIChatCompletionsModel
)
from openai import AsyncOpenAI
# Configure the OpenAI client with LangDB headers
client = AsyncOpenAI(
api_key=os.environ["LANGDB_API_KEY"],
base_url=os.environ["LANGDB_API_BASE_URL"],
default_headers={"x-project-id": os.environ["LANGDB_PROJECT_ID"]}
)
set_default_openai_client(client)
# Create a custom model provider for advanced routing
class CustomModelProvider(ModelProvider):
def get_model(self, model_name: str | None) -> Model:
return OpenAIChatCompletionsModel(model=model_name, openai_client=client)
agent = Agent(
name="Math Tutor",
instructions="You are a helpful assistant",
model="openai/gpt-4.1", # Choose any model from avaialable model on LangDB
)
# Register your custom model provider to route model calls through LangDB
CUSTOM_MODEL_PROVIDER = CustomModelProvider()
# Assign a unique group_id to link all steps in this session trace
group_id = str(uuid.uuid4())
response = await Runner.run(
agent,
input="Hello, world!",
run_config=RunConfig(
model_provider=CUSTOM_MODEL_PROVIDER, # Inject custom model provider
group_id=group_id # Link all steps to the same trace
)
)
```
Once executed, LangDB links all steps—model calls, intermediate tool usage, and runner orchestration—into a single session trace.
## Complete OpenAI Agents SDK Example
Here is a full example based on [OpenAI Agents SDK Quickstart](https://openai.github.io/openai-agents-python/quickstart/) which uses LangDB Tracing.
### Example code
> Check out the full sample on GitHub: [https://github.com/langdb/langdb-samples/tree/main/examples/openai/openai-agents-tracing](https://github.com/langdb/langdb-samples/tree/main/examples/openai/openai-agents-tracing)
### Setup Environment
```
pip install openai-agents 'pylangdb[openai]'
```
### Export Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
### main.py
```python
# Initialize LangDB tracing
from pylangdb.openai import init
init()
# Agent SDK imports
from agents import (
Agent,
Runner,
set_default_openai_client,
set_default_openai_key,
set_default_openai_api,
RunConfig,
ModelProvider,
Model,
OpenAIChatCompletionsModel
)
from openai import AsyncOpenAI
import os
import uuid
import asyncio
# Configure the OpenAI client with LangDB headers
client = AsyncOpenAI(api_key=os.environ["LANGDB_API_KEY"],
base_url=os.environ["LANGDB_API_BASE_URL"],
default_headers={"x-project-id": os.environ["LANGDB_PROJECT_ID"]})
# Set the configured client as default with tracing enabled
set_default_openai_client(client, use_for_tracing=True)
set_default_openai_api(api="chat_completions")
# set_default_openai_key(os.environ["LANGDB_API_KEY"])
# Create a custom model provider for advanced routing
class CustomModelProvider(ModelProvider):
def get_model(self, model_name: str | None) -> Model:
return OpenAIChatCompletionsModel(model=model_name, openai_client=client)
# Register your custom model provider to route model calls through LangDB
CUSTOM_MODEL_PROVIDER = CustomModelProvider()
math_tutor_agent = Agent(
name="Math Tutor",
handoff_description="Specialist agent for math questions",
instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
model="anthropic/claude-3.7-sonnet"
)
history_tutor_agent = Agent(
name="History Tutor",
handoff_description="Specialist agent for historical questions",
instructions="You provide assistance with historical queries. Explain important events and context clearly.",
model="gemini/gemini-2.0-flash" # Choose any model available on LangDB
)
triage_agent = Agent(
name="Triage Agent",
instructions="You determine which agent to use based on the user's homework question",
handoffs=[history_tutor_agent, math_tutor_agent],
model="openai/gpt-4o-mini" # Choose any model available on LangDB
)
# Assign a unique group_id to link all steps in this session trace
group_id = str(uuid.uuid4())
# Define async function to run the agent
async def run_agent():
response = await Runner.run(
triage_agent,
input="who was the first president of the united states?",
run_config=RunConfig(
model_provider=CUSTOM_MODEL_PROVIDER, # Inject custom model provider
group_id=group_id # Link all steps to the same trace
)
)
print(response.final_output)
# Run the async function with asyncio
asyncio.run(run_agent())
```
### Running Your Agent
Navigate to the parent directory of your agent project and use one of the following commands:
```bash
python main.py
```
#### Output:
```
The first president of the United States was **George Washington**.
Here's some important context:
* **The American Revolution (1775-1783):** Washington was the commander-in-chief of the Continental Army during the Revolutionary War. His leadership was crucial in securing American independence from Great Britain.
* **The Articles of Confederation (1781-1789):** After the war, the United States was governed by the Articles of Confederation. This system proved to be weak and ineffective, leading to calls for a stronger national government.
* **The Constitutional Convention (1787):** Delegates from the states met in Philadelphia to revise the Articles of Confederation. Instead, they drafted a new Constitution that created a more powerful federal government. Washington presided over the convention, lending his prestige and influence to the process.
* **The Constitution and the Presidency:** The Constitution established the office of the President of the United States.
* **Election of 1789:** George Washington was unanimously elected as the first president by the Electoral College in 1789. There were no opposing candidates. This reflected the immense respect and trust the nation had in him.
* **First Term (1789-1793):** Washington established many precedents for the presidency, including the formation of a cabinet, the practice of delivering an annual address to Congress, and the idea of serving only two terms. He focused on establishing a stable national government, paying off the national debt, and maintaining neutrality in foreign affairs.
* **Second Term (1793-1797):** Washington faced challenges such as the Whiskey Rebellion and growing partisan divisions. He decided to retire after two terms, setting another crucial precedent for peaceful transitions of power.
* **Significance:** Washington's leadership and integrity were essential in establishing the legitimacy and credibility of the new government. He is often considered the "Father of His Country" for his pivotal role in the founding of the United States.
```
### Traces on LangDB
When you run queries against your agent, LangDB automatically captures detailed traces of all agent interactions:
## Next Steps: Advanced OpenAI Agents SDK Integration
This guide covered the basics of integrating LangDB with OpenAI Agents SDK using a history and maths agent example. For more complex scenarios and advanced use cases, check out our comprehensive resources in [Guides Section](https://app.gitbook.com/s/jwXvc8GvLP6kUvOASaLP/).
---
## Working with API
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
# Working with API
You can use LangDB as a drop-in replacement for OpenAI APIs, making it easy to integrate into existing workflows and libraries such as OpenAI Client SDK.
You can choose from any of the [supported models](https://app.langdb.ai/models).
```python
from openai import OpenAI
langdb_project_id = "xxxxx" # LangDB Project ID
client = OpenAI(
base_url=f"https://api.us-east-1.langdb.ai/{langdb_project_id}/v1",
api_key="xxxxx" , # LangDB token
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4", # Change Model
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What are the earnings of Apple in 2022?"},
],
)
print("Assistant:", response.choices[0].message)
```
```typescript
import { OpenAI } from 'openai';
const langdbProjectId = 'xxxx'; // LangDB Project ID
const client = new OpenAI({
baseURL: `https://api.us-east-1.langdb.ai/${langdbProjectId}/v1`,
apiKey: 'xxxx' // Your LangDB token,
});
const messages = [
{
role: 'system',
content: 'You are a helpful assistant.'
},
{
role: 'user',
content: 'What are the earnings of Apple in 2022?'
}
];
async function getAssistantReply() {
const { choices } = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: messages
});
console.log('Assistant:', choices[0].message.content);
}
getAssistantReply();
```
```bash
curl "https://api.us-east-1.langdb.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LANGDB_API_KEY" \
-X "X-Project-Id: $Project_ID" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Write a haiku about recursion in programming."
}
],
"temperature": 0.8
}'
```
After sending your request, you can see the Traces on the dashboard:
[Check out the API reference here.](../api-reference/ai-gateway-api)
---
## Working with Headers
LangDB API provides robust support for **HTTP headers**, enabling developers to manage API requests efficiently with enhanced tracing, observability, and organization.
These headers play a crucial role in structuring interactions with multiple LLMs by providing **tracing, request tracking, and session continuity**, making it easier to monitor, and analyze API usage
## Thread ID (x-thread-id)
**Usage:** Groups multiple related requests under the same conversation
* Useful for tracking interactions over a single user session.
* Helps maintain context across multiple messages.
## Thread Title (x-thread-title)
**Usage:** Assigns a custom, human-readable title to a thread.
* This title is displayed in the LangDB UI, making it easier to identify and search for specific conversations.
## Public Thread (x-thread-public)
**Usage:** Makes a thread publicly accessible via a shareable link.
* Set the value to `1` or `true` to enable public sharing.
* The public URL will be: `https://app.langdb.ai/sharing/threads/{thread_id}`
* The `x-thread-title`, if set, will be displayed on the public thread page.
Check [Threads](../concepts/thread.md) for more details.
## Run ID (x-run-id)
**Usage:** Tracks a unique workflow execution in LangDB, such as a model call or tool invocation.
* Enables precise tracking and debugging.
* Each Run is independent for better observability.
Check [Run](../concepts/run.md) for more details.
## Label (x-label)
**Usage:** Adds a custom tag or label to a LLM Model Call for easier categorization.
* Helps with tracing multiple agents.
Check [Label](../concepts/label.md) for more details.
## Project ID (x-project-id)
**Usage:** Identifies the project under which the request is being made.
* Helps in cost tracking, monitoring, and organizing API calls within a specific project.
* Can be set in headers or directly in the API base URL `https://api.us-east-1.langdb.ai/${langdbProjectId}/v1`
---
## Working with MCPs
Instantly connect to managed MCP servers — skip the setup and start using fully managed MCPs with built-in authentication, seamless scalability, and full tracing. This guide gives you a quick walkthrough of how to get started with MCPs.
## Quick Example
In this example, we'll create a [**Virtual MCP Server**](../concepts/virtual-mcp-servers/) by combining Slack and Gmail MCPs — and then connect it to an MCP Client like **Cursor** for instant access inside your chats.

### Steps:
1. **Select Slack and Gmail** from MCP Severs in the Virtual MCP Section.
2. **Generate** a Virtual MCP URL automatically.
3. **Install** the MCP into Cursor with a single command.
Example install command:
```bash
npx @langdb/mcp setup slack_gmail_virtual https://api.langdb.ai/mcp/xxxxx --client cursor
```
## What Happens Under the Hood?
* Authentication is handled (via OAuth or API Key)
* Full tracing and observability are available (inputs, outputs, errors, latencies)
* MCP tools are treated just like normal function calls inside LangDB
## Next Steps:
* MCP Servers listed on LangDB: [https://app.langdb.ai/mcp-servers](https://app.langdb.ai/mcp-servers)
* Explore [MCP Usecases](https://app.gitbook.com/o/BdRlRaljD2qHr7uIBrM0/s/jwXvc8GvLP6kUvOASaLP/).
---
## Working with Multiple Agents
LangDB automatically visualizes how agents interact, providing a clear view of workflows, hierarchies, and usage patterns by adding [`run`](https://docs.langdb.ai/ai-gateway/concepts/run) and [`thread`](https://docs.langdb.ai/ai-gateway/concepts/thread) headers.
This allows developers to track interactions between agents seamlessly, ensuring clear visibility into workflows and dependencies.
## What is a Multi-Agent System?
A multi-agent system consists of independent agents collaborating to solve complex tasks. Agents handle various roles such as user interaction, data processing, and workflow orchestration. LangDB streamlines tracking these interactions for better efficiency and transparency.
## Why Track Workflows?
Tracking ensures:
* **Clear Execution Flow**: Understand how agents interact.
* **Performance Optimization**: Identify bottlenecks.
* **Reliability & Accountability**: Improve transparency.
LangDB supports two main concepts.
* [**Run**](../concepts/run.md): A complete end-to-end interaction between agents, grouped for easy tracking.
* [**Thread**](../concepts/thread.md): Aggregate multiple Runs into a single thread for a unified chat experience.
Example
Using the same **Run ID** and **Thread ID** across multiple agents ensures seamless tracking, maintaining context across interactions and providing a complete view of the workflow
```python
from openai import OpenAI
from uuid import uuid4
client = OpenAI(
base_url="https://api.us-east-1.langdb.ai/{langdb_project_id}/v1" # LangDB API base URL,
api_key=api_key, # Replace with your LangDB token
)
response1 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}],
extra_headers={"x-thread-id": thread_id, "x-run-id": run_id}
)
# Agent 2 processes the response
response2 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "developer", "content": "Processing user input."},
{"role": "user", "content": response1.choices[0].message["content"]}],
extra_headers={"x-thread-id": thread_id, "x-run-id": run_id}
)
```
Checkout the full Multi-Agent Tracing Example [here](https://github.com/langdb/langdb-samples/blob/main/examples/langchain/langchain-multi-agent/langchain-multi-agent.ipynb).
---
## Introduction to AI Gateway
## **What is an AI Gateway?**
An AI gateway is a middleware that acts as a unified access point to multiple LLMs, optimizing, securing, and managing AI traffic. It simplifies integration with different AI providers while enabling cost control, observability, and performance benchmarking. With an AI gateway, businesses can seamlessly switch between models, monitor usage, and optimize costs.
LangDB provides OpenAI compatible APIs to connect with multiple Large Language Models (LLMs) by just changing two lines of code.
Govern, Secure, and Optimize all of your AI Traffic with Cost Control, Optimisation and Full Observability.
**What AI Gateway Offers Out of the Box**
LangDB provides OpenAI-compatible APIs, enabling developers to connect with multiple LLMs by changing just two lines of code. With LangDB, you can:
* **Provide access to all major LLMs**\
Ensure seamless integration with leading large language models to maximize flexibility and power.
* **No framework code required**\
Enable plug-and-play functionality using any framework like Langchain, Vercel AI SDK, CrewAI, etc., for easy adoption.
* **Plug & Play Tracing & Cost Optimization**\
Simplify implementation of tracing and cost optimization features, ensuring streamlined operations.
* **Automatic routing based on cost, quality, and other variables**\
Dynamically route requests to the most suitable LLM based on predefined parameters.
* **Benchmark and provide insights**\
Deliver insights into the best-performing models for specific tasks, such as coding or reasoning, to enhance decision-making.
**Quick Start with LangDB**
LangDB offers both managed and self hosted versions for organisations to manage AI traffic . Choose between the **Hosted Gateway** for ease of use or the **Open-Source Gateway** for full control.
Quick Start
A full featured and managed AI gateway that provides instant access to 250+ LLMs with enterprise ready features.
Self Hosted
A self-hosted option for organizations that require complete control over their AI infrastructure.
### Roadmap
* **Prompt Caching & Optimization** _(In Progress)_\
Introduce caching mechanisms to optimize prompt usage and reduce redundant costs.
* **GuardRails** _(In Progress)_\
Implement safeguards to enhance reliability and accuracy in AI outputs.
* **Leaderboard of models per category**\
Create a comparative leaderboard to highlight model performance across categories.
* **Ready-to-use evaluations for non-data scientists**\
Provide accessible evaluation tools for users without a data science background.
* **Readily fine-tunable data based on usage**\
Offer pre-configured datasets tailored for fine-tuning, enabling customized improvements with ease.
---
## Getting Started
[](https://badge.fury.io/py/pylangdb) [](https://opensource.org/licenses/MIT)
### Key Features
LangDB exposes **two complementary capabilities**:
1. **Chat Completions Client** – Call LLMs using the `LangDb` Python client. This works as a drop-in replacement for `openai.ChatCompletion` while adding automatic usage, cost and latency reporting.
2. **Agent Tracing** – Instrument your existing AI framework (ADK, LangChain, CrewAI, etc.) with a single `init()` call. All calls are routed through the LangDB collector and are enriched with additional metadata regarding the framework is visible on the LangDB dashboard.
***
## Quick Start (Chat Completions)
```bash
pip install pylangdb[client]
```
```python
from pylangdb.client import LangDb
# Initialize LangDB client
client = LangDb(api_key="your_api_key", project_id="your_project_id")
# Simple chat completion
resp = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(resp.choices[0].message.content)
```
***
## Agent Tracing Quick Start
```bash
# Install the package with Google ADK support
pip install pylangdb[adk]
```
```python
# Import and initialize LangDB tracing
# First initialize LangDB before defining any agents
from pylangdb.adk import init
init()
import datetime
from zoneinfo import ZoneInfo
from google.adk.agents import Agent
def get_weather(city: str) -> dict:
if city.lower() != "new york":
return {"status": "error", "error_message": f"Weather information for '{city}' is not available."}
return {"status": "success", "report": "The weather in New York is sunny with a temperature of 25 degrees Celsius (77 degrees Fahrenheit)."}
def get_current_time(city: str) -> dict:
if city.lower() != "new york":
return {"status": "error", "error_message": f"Sorry, I don't have timezone information for {city}."}
tz = ZoneInfo("America/New_York")
now = datetime.datetime.now(tz)
return {"status": "success", "report": f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}'}
root_agent = Agent(
name="weather_time_agent",
model="gemini-2.0-flash",
description=("Agent to answer questions about the time and weather in a city." ),
instruction=("You are a helpful agent who can answer user questions about the time and weather in a city."),
tools=[get_weather, get_current_time],
)
```
> **Note:** Always initialize LangDB **before** importing any framework-specific classes to ensure proper instrumentation.
**Example Trace Screenshot**

### Supported Frameworks (Tracing)
| Framework | Installation | Import Pattern | Key Features |
| ---------- | --------------------------------- | ------------------------------------- | --------------------------------------------- |
| Google ADK | `pip install pylangdb[adk]` | `from pylangdb.adk import init` | Automatic sub-agent discovery |
| OpenAI | `pip install pylangdb[openai]` | `from pylangdb.openai import init` | Custom model provider support and Run Tracing |
| LangChain | `pip install pylangdb[langchain]` | `from pylangdb.langchain import init` | Automatic chain tracing |
| CrewAI | `pip install pylangdb[crewai]` | `from pylangdb.crewai import init` | Multi-agent crew tracing |
| Agno | `pip install pylangdb[agno]` | `from pylangdb.agno import init` | Tool usage tracing, model interactions |
## How It Works
LangDB uses intelligent monkey patching to instrument your AI frameworks at runtime:
Click to see technical details for each framework
#### Google ADK
* Patches `Agent.__init__` to inject callbacks
* Tracks agent hierarchies and tool usage
* Maintains thread context across invocations
#### OpenAI
* Intercepts HTTP requests via `AsyncOpenAI.post`
* Propagates trace context via headers
* Correlates spans across agent interactions
#### LangChain
* Modifies `httpx.Client.send` for request tracing
* Automatically tracks chains and agents
* Injects trace headers into all requests
#### CrewAI
* Intercepts `litellm.completion` for LLM calls
* Tracks crew members and task delegation
* Propagates context through LiteLLM headers
#### Agno
* Patches `LangDB.invoke` and client parameters
* Traces workflows and model interactions
* Maintains consistent session context
## Installation
```bash
# For client library functionality (chat completions, analytics, etc.)
pip install pylangdb[client]
# For framework tracing - install specific framework extras
pip install pylangdb[adk] # Google ADK tracing
pip install pylangdb[openai] # OpenAI agents tracing
pip install pylangdb[langchain] # LangChain tracing
pip install pylangdb[crewai] # CrewAI tracing
pip install pylangdb[agno] # Agno tracing
```
## Configuration
Set your credentials (or pass them directly to the `init()` function):
```bash
export LANGDB_API_KEY="your-api-key"
export LANGDB_PROJECT_ID="your-project-id"
```
## Client Usage (Chat Completions)
### Initialize LangDb Client
```python
from pylangdb import LangDb
# Initialize with API key and project ID
client = LangDb(api_key="your_api_key", project_id="your_project_id")
```
### Chat Completions
```python
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say hello!"}
]
response = client.completion(
model="gemini-1.5-pro-latest",
messages=messages,
temperature=0.7,
max_tokens=100
)
```
### Thread Operations
**Get Messages**
Retrieve messages from a specific thread:
```python
messages = client.get_messages(thread_id="your_thread_id")
# Access message details
for message in messages:
print(f"Type: {message.type}")
print(f"Content: {message.content}")
if message.tool_calls:
for tool_call in message.tool_calls:
print(f"Tool: {tool_call.function.name}")
```
**Get Thread Cost**
Get cost and token usage information for a thread:
```python
usage = client.get_usage(thread_id="your_thread_id")
print(f"Total cost: ${usage.total_cost:.4f}")
print(f"Input tokens: {usage.total_input_tokens}")
print(f"Output tokens: {usage.total_output_tokens}")
```
### Analytics
Get analytics data for specific tags:
```python
# Get raw analytics data
analytics = client.get_analytics(
tags="model1,model2",
start_time_us=None, # Optional: defaults to 24 hours ago
end_time_us=None # Optional: defaults to current time
)
# Get analytics as a pandas DataFrame
df = client.get_analytics_dataframe(
tags="model1,model2",
start_time_us=None,
end_time_us=None
)
```
#### Evaluate Multiple Threads
```python
df = client.create_evaluation_df(thread_ids=["thread1", "thread2"])
print(df.head())
```
#### List Available Models
```python
models = client.list_models()
print(models)
```
## Framework-Specific Examples (Tracing)
### Google ADK
```python
from pylangdb.adk import init
# Monkey-patch the client for tracing
init()
# Import your agents after initializing tracing
from google.adk.agents import Agent
from travel_concierge.sub_agents.booking.agent import booking_agent
from travel_concierge.sub_agents.in_trip.agent import in_trip_agent
from travel_concierge.sub_agents.inspiration.agent import inspiration_agent
from travel_concierge.sub_agents.planning.agent import planning_agent
from travel_concierge.sub_agents.post_trip.agent import post_trip_agent
from travel_concierge.sub_agents.pre_trip.agent import pre_trip_agent
from travel_concierge.tools.memory import _load_precreated_itinerary
root_agent = Agent(
model="openai/gpt-4.1",
name="root_agent",
description="A Travel Conceirge using the services of multiple sub-agents",
instruction="Instruct the travel concierge to plan a trip for the user.",
sub_agents=[
inspiration_agent,
planning_agent,
booking_agent,
pre_trip_agent,
in_trip_agent,
post_trip_agent,
],
before_agent_callback=_load_precreated_itinerary,
)
```
### OpenAI
```python
import uuid
import os
# Import LangDB tracing
from pylangdb.openai import init
# Initialize tracing
init()
# Import agent components
from agents import (
Agent,
Runner,
set_default_openai_client,
RunConfig,
ModelProvider,
Model,
OpenAIChatCompletionsModel
)
# Configure OpenAI client with environment variables
from openai import AsyncOpenAI
client = AsyncOpenAI(
api_key=os.environ.get("LANGDB_API_KEY"),
base_url=os.environ.get("LANGDB_API_BASE_URL"),
default_headers={
"x-project-id": os.environ.get("LANGDB_PROJECT_ID")
}
)
set_default_openai_client(client)
# Create a custom model provider
class CustomModelProvider(ModelProvider):
def get_model(self, model_name: str | None) -> Model:
return OpenAIChatCompletionsModel(model=model_name, openai_client=client)
CUSTOM_MODEL_PROVIDER = CustomModelProvider()
agent = Agent(
name="Math Tutor",
model="gpt-4.1",
instruction="You are a math tutor who can help students with their math homework.",
)
group_id = str(uuid.uuid4())
# Use the model provider with a unique group_id for tracing
async def run_agent():
response = await Runner.run(
triage_agent,
input="Hello World",
run_config=RunConfig(
model_provider=CUSTOM_MODEL_PROVIDER, # Inject custom model provider
group_id=group_id # Link all steps to the same trace
)
)
print(response.final_output)
# Run the async function with asyncio
asyncio.run(run_agent())
```
### LangChain
```python
import os
from pylangdb.langchain import init
init()
# Get environment variables for configuration
api_base = os.getenv("LANGDB_API_BASE_URL")
api_key = os.getenv("LANGDB_API_KEY")
if not api_key:
raise ValueError("Please set the LANGDB_API_KEY environment variable")
project_id = os.getenv("LANGDB_PROJECT_ID")
# Default headers for API requests
default_headers: dict[str, str] = {
"x-project-id": project-id
}
# Your existing LangChain code works with proper configuration
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
# Initialize OpenAI LLM with proper configuration
llm = ChatOpenAI(
model_name="gpt-4",
temperature=0.3,
openai_api_base=api_base,
openai_api_key=api_key,
default_headers=default_headers,
)
result = llm.invoke([HumanMessage(content="Hello, LangChain!")])
```
### CrewAI
```python
import os
from crewai import Agent, Task, Crew, LLM
from dotenv import load_dotenv
load_dotenv()
# Import and initialize LangDB tracing
from pylangdb.crewai import init
# Initialize tracing before importing or creating any agents
init()
# Initialize API credentials
api_key = os.environ.get("LANGDB_API_KEY")
api_base = os.environ.get("LANGDB_API_BASE_URL")
project_id = os.environ.get("LANGDB_PROJECT_ID")
# Create LLM with proper headers
llm = LLM(
model="gpt-4",
api_key=api_key,
base_url=api_base,
extra_headers={
"x-project-id": project_id
}
)
# Create and use your CrewAI components as usual
# They will be automatically traced by LangDB
researcher = Agent(
role="researcher",
goal="Research the topic thoroughly",
backstory="You are an expert researcher",
llm=llm,
verbose=True
)
task = Task(
description="Research the given topic",
agent=researcher
)
crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
```
### Agno
```python
import os
from agno.agent import Agent
from agno.tools.duckduckgo import DuckDuckGoTools
# Import and initialize LangDB tracing
from pylangdb.agno import init
init()
# Import LangDB model after initializing tracing
from agno.models.langdb import LangDB
# Create agent with LangDB model
agent = Agent(
name="Web Agent",
role="Search the web for information",
model=LangDB(
id="openai/gpt-4",
base_url=os.getenv("LANGDB_API_BASE_URL") + '/' + os.getenv("LANGDB_PROJECT_ID") + '/v1',
api_key=os.getenv("LANGDB_API_KEY"),
project_id=os.getenv("LANGDB_PROJECT_ID"),
),
tools=[DuckDuckGoTools()],
instructions="Answer questions using web search",
show_tool_calls=True,
markdown=True,
)
# Use the agent
response = agent.run("What is LangDB?")
```
### Advanced Configuration
#### Environment Variables
| Variable | Description | Default |
| -------------------------- | --------------------------------- | -------------------------------------- |
| `LANGDB_API_KEY` | Your LangDB API key | Required |
| `LANGDB_PROJECT_ID` | Your LangDB project ID | Required |
| `LANGDB_API_BASE_URL` | LangDB API base URL | `https://api.us-east-1.langdb.ai` |
| `LANGDB_TRACING_BASE_URL` | Tracing collector endpoint | `https://api.us-east-1.langdb.ai:4317` |
| `LANGDB_TRACING` | Enable/disable tracing | `true` |
| `LANGDB_TRACING_EXPORTERS` | Comma-separated list of exporters | `otlp`, `console` |
#### Custom Configuration
All `init()` functions accept the same optional parameters:
```python
from langdb.openai import init
init(
collector_endpoint='https://api.us-east-1.langdb.ai:4317',
api_key="langdb-api-key",
project_id="langdb-project-id"
)
```
### Technical Details
#### Session and Thread Management
* **Thread ID**: Maintains consistent session identifiers across agent calls
* **Run ID**: Unique identifier for each execution trace
* **Invocation Tracking**: Tracks the sequence of agent invocations
* **State Persistence**: Maintains context across callbacks and sub-agent interactions
#### Distributed Tracing
* **OpenTelemetry Integration**: Uses OpenTelemetry for standardized tracing
* **Attribute Propagation**: Automatically propagates LangDB-specific attributes
* **Span Correlation**: Links related spans across different agents and frameworks
* **Custom Exporters**: Supports multiple export formats (OTLP, Console)
### API Reference
#### Initialization Functions
Each framework has a simple `init()` function that handles all necessary setup:
* `langdb.adk.init()`: Patches Google ADK Agent class with LangDB callbacks
* `langdb.openai.init()`: Initializes OpenAI agents tracing
* `langdb.langchain.init()`: Initializes LangChain tracing
* `langdb.crewai.init()`: Initializes CrewAI tracing
* `langdb.agno.init()`: Initializes Agno tracing
All init functions accept optional parameters for custom configuration (collector\_endpoint, api\_key, project\_id)
## Troubleshooting
#### Common Issues
1. **Missing API Key**: Ensure `LANGDB_API_KEY` and `LANGDB_PROJECT_ID` are set
2. **Tracing Not Working**: Check that initialization functions are called before creating agents
3. **Network Issues**: Verify collector endpoint is accessible
4. **Framework Conflicts**: Initialize LangDB integration before other instrumentation
---
## Architecture Overview
This page describes the core architecture of the LangDB AI Gateway, a unified platform for interfacing with a wide variety of Large Language Models (LLMs) and building agentic applications with enterprise-grade observability, cost control, and scalability, MCP features and more.
### Core Components
| Component | Purpose / Description | Enterprise Features / Notes |
| ------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| **AI Gateway** | Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing.Free & Open Source version available at langdb/ai-gateway. | Multi-tenancy, advanced cost control, and rate limiting. Contact LangDB for access. |
| **Metadata Store (PostgreSQL)** | Stores metadata related to API usage, configurations, and more. | For scalable/multi-tenant deployments, use managed PostgreSQL (e.g., AWS RDS, GCP Cloud SQL). |
| Cache Store (Redis) | Implements rolling cost control and rate limiting for API usage. | Enterprise version supports Redis integration for cost control and rate limiting. |
| **Observability & Analytics Store (ClickHouse)** | Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry. | For large-scale deployments, use ClickHouse Cloud. Traces stored in `langdb.traces` table. |
**Note:**
* **Metadata Store:** Powered by **PostgreSQL** (consider AWS RDS, GCP Cloud SQL for enterprise)
* **Cache Store:** Powered by **Redis** (enterprise only)
* **Observability & Analytics Store:** Powered by **ClickHouse** (consider ClickHouse Cloud for scale)
### Environment Overview
LangDB provisions a dedicated environment for each tenant. This environment is isolated per tenant and is set up in a separate AWS account or GCP project, managed by LangDB. Customers connect securely to their provisioned environment from their own VPCs, ensuring strong network isolation and security.
LangDB itself operates a thin, shared public cloud environment (the "control plane") that is primarily responsible for:
* Provisioning new tenant environments
* Managing access control and user/tenant provisioning
* Handling external federated account connections (e.g., SSO)
* Hosting the LangDB Dashboard frontend application for configuration, monitoring, and management
All operational workloads, data storage, and LLM/MCP execution occur within the tenant-specific environment. The shared LangDB cloud is not involved in data processing or LLM execution, but only in provisioning, access management, and dashboard hosting.
#### Customer Environment
* Integrates with customer identity providers (Active Directory, SAML, SSO).
* Users (AI Apps, Agents, Administrators, Developers) interact with LangDB via secure endpoints.
#### LangDB Dashboard
* Centralized dashboard for configuration, monitoring, and management.
* Handles user and tenant provisioning, access control, and external federated account connections.
* All provisioning and access is centrally managed via LangDB Cloud and Dashboard.
#### Tenant Environment (Execution Layer)
* Each tenant (enterprise deployment) is provisioned in a dedicated AWS account or GCP project.
* Communication between tenant environment and LangDB is secured and managed.
* Provisioning is automated via Terraform.
***
### Store Descriptions
#### Metadata Store (PostgreSQL)
Stores all configuration and metadata required for operation, including:
* Virtual models
* Virtual MCP servers
* Projects
* Guardrails
* Routers
#### Redis (Cache Store)
Used for fast, in-memory operations related to:
* Rate limiting & cost control
* LLM usage tracking
* MCP usage tracking
#### ClickHouse (Analytics & Observability Store)
Stores analytics and observability data:
* Traces (API calls, LLM invocations, etc.)
* Metrics (performance, usage, etc.)
***
### User and Tenant Provisioning
* User and tenant provisioning is centrally controlled via LangDB Cloud and Dashboard.
* External federated accounts (e.g., enterprise SSO) can be connected to LangDB Cloud for seamless access management.
***
### Data Retention
* Data retention policies mainly apply to observability data (traces, metrics) stored in ClickHouse.
* Retention is enforced per subscription tier; traces are automatically cleared after the retention period expires.
***
### MCP Server Deployment
* MCP servers are deployed in a serverless fashion using AWS Lambda or GCP Cloudrun for scalability and cost efficiency.
---
## Routing Engine (Enterprise Only)
LangDB's Routing Engine enables organizations to control how user requests are handled by AI models, optimizing for cost, performance, compliance, and user experience. By defining routing rules in JSON, businesses can automate decision-making, ensure reliability, and maximize value from their AI investments.
## What is Routing and What are its Components?
Routing is the process of directing an incoming request to the most appropriate AI model based on a set of rules. The LangDB router is composed of several key components that work together to execute this logic:
* **Routes**: These are the core building blocks of your router. A router is essentially a list of routes that are evaluated in order. The first route whose conditions are met is executed.
* **Conditions**: The logic that determines whether a route should be triggered. Conditions can evaluate request data, user metadata, or results from pre-request hooks.
* **Targets**: The destination for a request if a route's conditions are met. This is typically one or more AI models.
* **Interceptors (Guardrails & Rate Limiters)**: These are pre-request hooks that can inspect, modify, or enrich a request before routing rules are evaluated. Their results can be used in conditions.
* **Message Mapper**: A component used to block a request or modify the final response, often for handling errors like rate limits.
## Example Use Cases
| Enterprise Use Case | Business Goal | Key Variables & Metrics | Routing Logic Summary |
| ------------------------------- | ----------------------------------------------------------------------- | ----------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **SLA-Driven Tiering** | Guarantee premium performance for high-value customers. | `extra.user.tier`, `ttft` | Route `extra.user.tier: "premium"` to models with the lowest `ttft`. |
| **Geographic Compliance** | Ensure data sovereignty and meet regulatory requirements (e.g., GDPR). | `metadata.region`, `extra.user.tags` | If `metadata.region: "EU"`, route to models for users with `tags: ["GDPR"]`. |
| **Intelligent Cost Management** | Reduce operational expenses for internal or low-priority tasks. | `metadata.group_name`, `price` | If `metadata.group_name: "internal"`, sort available models by `"sort_by": "price"`. |
| **Content-Aware Routing** | Improve accuracy by using specialized models for specific topics. | `pre_request.semantic_guardrail.result.topic` | If `topic: "finance"`, route to a finance-tuned model. |
| **Brand Safety Enforcement** | Prevent brand damage by blocking or redirecting inappropriate content. | `pre_request.toxicity_guardrail.result.passed` | If `passed: false`, block the request or route to a safe-reply model. |
For more detailed examples, see the pages below:
Quick Examples
Quick, focused routing patterns you can copy and adapt.
Building an Enterprise Routing Configuration
End-to-end example of a multi-layer enterprise routing setup with tiering, cost and fallbacks.
Routing with Interceptors and Compliance
Example showing rate limiting, semantic guardrails, GDPR routing, and error handling.
## Anatomy of a Routing Request
A routing request is a standard chat completion request with two key additions:
1. The `model` must be set to `"router/dynamic"`.
2. A `router` object containing your routing logic must be included in the request body.
Here’s how the various components fit into a complete request. The example below shows a simple configuration with two routes: one for premium users and a fallback for everyone else.
```json
{
"model": "router/dynamic",
"messages": [
{
"role": "user",
"content": "Our production API is down, I need help now!"
}
],
"extra": {
"user": { "tier": "premium" }
},
"router": {
"type": "conditional",
"routes": [
{
"name": "premium_support_fast_track",
"conditions": {
"all": [
{ "extra.user.tier": { "$eq": "premium" } }
]
},
"targets": {
"$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "default_fallback",
"conditions": { "all": [] },
"targets": "openai/gpt-4o-mini"
}
]
}
}
```
---
## Routing Components Explained
This section breaks down each of the major components of the `router` object.
### Routes
The `routes` property contains an array of route objects. These are evaluated sequentially from top to bottom, and the first route whose `conditions` are met will be executed. Every route must have a `name`, `conditions`, and `targets`.
### Conditions
The `conditions` block defines *when* a route should be activated. It uses a flexible JSON-based syntax.
* **Logical Operators**: You can combine multiple conditions using `all` (AND) or `any` (OR).
* **Comparison Operators**: Conditions use operators like `$eq` (equal), `$neq` (not equal), `$in` (in array), `$lt` (less than), `$gt` (greater than) to evaluate variables.
* **Lazy Evaluation of Guardrails**: It's important to note that guardrails are evaluated **lazily**. A guardrail interceptor will only be executed if the router encounters a condition that requires its result (`pre_request.{guardrail_name}.*`). This prevents unnecessary latency.
### Targets
The `targets` block defines *what* happens when a route is matched. It specifies one or more models to which the request can be sent.
#### Specifying Models
You can specify models in your `targets` list in several ways, giving you flexibility in how you define your candidate pool.
* **Exact Name with Provider**: `openai/gpt-4o`
* This is the most specific and recommended method. It uniquely identifies a single model from a single provider.
* **Provider Wildcard**: `openai/*`
* This selects all available models from a specific provider (e.g., all models from OpenAI). This is useful for creating provider-level routing rules or fallbacks.
* **Model Name Only**: `claude-3-opus`
* This selects all models with that name from *any* available provider. For example, if both Anthropic and another provider offered `claude-3-opus`, both would be added to the candidate pool. This is particularly useful when you want to use sorting to find the best provider for a specific model based on real-time metrics like `price` or `ttft`. Use with caution, as it can be ambiguous if providers have different capabilities for the same model name.
#### Filtering Models
Before selecting a model, you can filter the list of potential targets in `$any` using the `filter` property. This is useful for ensuring models meet certain real-time performance criteria.
```json
"targets": {
"$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
"filter": {
"error_rate": { "$lt": 0.02 }
}
}
```
This example ensures that only models with an error rate below 2% are considered.
#### Sorting Models
After filtering, the router can sort the remaining candidate models to find the best one based on a specific metric.
* `sort_by`: The metric to use for sorting. Common values are `price`, `ttft` (time to first token), and `error_rate`.
* `sort_order`: The direction to sort, either `min` (for lowest cost/latency) or `max`.
```json
"targets": {
"$any": ["mistral/mistral-large-latest", "anthropic/claude-4-sonnet"],
"sort_by": "price",
"sort_order": "min"
}
```
This example selects the cheapest model from the pool.
### Interceptors (Guardrails & Rate Limiters)
**Interceptors** are hooks that run *before* the main routing logic is evaluated. They are defined in the `pre_request` array.
* **Guardrails** enforce content and safety policies (e.g., checking for toxicity or classifying topics).
* **Rate Limiters** enforce usage quotas to prevent abuse.
The results of these interceptors are made available in the `pre_request` variable space for use in your conditions. For detailed configuration, see [Interceptors & Guardrails](interceptors-and-guardrails.md).
### Message Mapper
The **Message Mapper** is used to take direct control of the response. Its most common use case is to block a request that has failed an interceptor check and return a custom error message.
```json
{
"name": "rate_limit_exceeded_block",
"conditions": {
"pre_request.rate_limiter.passed": { "$eq": false }
},
"message_mapper": {
"modifier": "block",
"content": "You have exceeded your daily quota."
}
}
```
### Metadata and Variables
Effective routing relies on rich contextual information. LangDB provides two main sources of data for your conditions: `extra.user.*` data, which you pass in the request, and `metadata.*` data, which is automatically populated by LangDB. For a complete list, see [Variables & Functions](variables-and-functions.md).
## Components Summary
| Component | Purpose | Key Configuration |
| -------------- | -------------------------------------------------- | ------------------------------------------------------- |
| **Routes** | A list of rules evaluated in order. | `name`, `conditions`, `targets` |
| **Conditions** | The "if" statement for a route. | `all`, `any`, operators (`$eq`, `$lt`), variables |
| **Targets** | The destination model(s) for a route. | `$any`, `filter`, `sort_by`, `sort_order` |
| **Interceptors** | Pre-request hooks for validation or enrichment. | `pre_request` array, `type` (`guardrail`, `interceptor`) |
| **Message Mapper** | Blocks or modifies the final response. | `modifier: "block"`, `content` |
---
## Performance Impact
Different routing components have different performance characteristics.
* **Guardrails & Interceptors**:
* Simple checks like a **Rate Limiter** are very fast, typically involving a quick lookup in Redis.
* More complex guardrails, especially those that are "LLM-as-a-judge" (i.e., they make an LLM call to evaluate the prompt), will introduce significant latency to the request, equal to the duration of that LLM call.
* **Model Sorting**:
* Sorting models based on performance metrics (`ttft`, `error_rate`) or `price` requires fetching this data from a metrics store (like Redis). This is a very fast operation but does add a small, fixed overhead to each routed request.
---
## Tracing & Observability
Routing decisions are fully transparent and traceable within the LangDB ecosystem.
* **Routing Performance**: The performance of the routing logic itself is tracked, so you can monitor for any overhead.
* **Candidate & Picked Models**: For every request, the trace records:
* The initial pool of `candidate models` for a matched route.
* The list of models remaining after `filtering`.
* The final `picked model` after `sorting`.
* **OpenTelemetry**: All router metrics (decisions, latencies, error rates) are exported via OpenTelemetry, allowing you to integrate with your existing observability stack (e.g., DataDog, New Relic) for real-time analytics and alerting.
---
---
## Example: Building an Enterprise Routing Configuration
This example demonstrates a multi-layered routing strategy for a SaaS company that balances performance for premium users, cost for standard users, and flexibility for internal development.
**Goals:**
1. Provide the fastest possible responses for "premium" customers on support-related queries.
2. Minimize costs for "standard" tier users.
3. Allow the internal "development" team to test a new, experimental model without affecting customers.
**Complete Chat Completion Request with Enterprise Routing:**
```json
{
"model": "router/dynamic",
"messages": [
{
"role": "system",
"content": "You are an expert customer support agent for TechCorp SaaS platform. You have access to user account information and can help resolve billing, technical, and feature-related issues. Always provide helpful, accurate responses and escalate complex issues when needed."
},
{
"role": "user",
"content": "Hi, I'm experiencing critical downtime with our enterprise API integration. Our production system has been failing to authenticate for the past 30 minutes and we're losing revenue. This is urgent - can you help me troubleshoot the OAuth token refresh process immediately?"
}
],
"stream": true,
"extra": {
"user": {
"tier": "premium",
"request": {
"topic": "support"
}
}
},
"router": {
"type": "conditional",
"routes": [
{
"name": "premium_support_fast_track",
"conditions": {
"all": [
{
"extra.user.tier": {
"$eq": "premium"
}
},
{
"extra.user.request.topic": {
"$eq": "support"
}
}
]
},
"targets": {
"$any": [
"anthropic/claude-4-opus",
"openai/gpt-o3",
"gemini/gemini-2.5-pro",
"xai/grok-4"
],
"filter": {
"ttft": {
"$lt": 1000
}
},
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "standard_user_cost_optimized",
"conditions": {
"extra.user.tier": {
"$eq": "standard"
}
},
"targets": {
"$any": [
"mistral/mistral-large-latest",
"anthropic/claude-4-sonnet"
],
"sort_by": "price",
"sort_order": "min"
}
},
{
"name": "internal_dev_testing",
"conditions": {
"metadata.group_name": {
"$eq": "development"
}
},
"targets": "google/gemini-2.5-pro"
},
{
"name": "fallback_route",
"conditions": {
"all": []
},
"targets": "openai/gpt-4o-mini"
}
],
"max_retries": 2
}
}
```
#### **Configuration Breakdown:**
* **Request Structure**:
* Uses `"model": "router/dynamic"` to enable dynamic routing
* User information is passed via the `extra.user` object
* Router configuration is specified in the `router` field with `"type": "conditional"`
* **Route 1: `premium_support_fast_track`**
* **Conditions**: Applies when `extra.user.tier` equals `"premium"` AND `extra.user.request.topic` equals `"support"`
* **Targets**: Routes to high-performance models (`anthropic/claude-4-opus`, `openai/gpt-o3`, `gemini/gemini-2.5-pro`, `xai/grok-4`) with filtering for fast response times (ttft < 1000ms) and sorts by minimum time-to-first-token
* **Route 2: `standard_user_cost_optimized`**
* **Conditions**: Catches all requests from `"standard"` tier users via `extra.user.tier`
* **Targets**: Uses cost-effective models (`mistral/mistral-large-latest`, `anthropic/claude-4-sonnet`) and sorts by minimum price
* **Route 3: `internal_dev_testing`**
* **Conditions**: Applies to users in the `"development"` cost group via `metadata.group_name` (automatically set by LangDB based on the LangDB user's cost group assignment)
* **Targets**: Routes directly to `google/gemini-2.5-pro` for isolated testing
* **Route 4: `fallback_route`**
* **Conditions**: Empty conditions array (`"all": []`) catches all remaining requests
* **Targets**: Routes to `openai/gpt-4o-mini` as a reliable fallback option
**Key Features Demonstrated:**
* **Conditional Logic**: Uses `all` operators and comparison operators (`$eq`, `$lt`)
* **Target Selection**: Shows both single targets and pools with filtering/sorting
* **Request Context**: Leverages both `extra` user data and `metadata` for routing decisions
* **Cost Group Integration**: `metadata.group_name` is automatically populated by LangDB based on the LangDB user's cost group assignment
* **Retry Configuration**: Includes `max_retries: 2` for resilience
***
## Additional Example Scenarios
### Scenario 1: Standard User Cost-Optimized Request
```json
{
"model": "router/dynamic",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant for analyzing business data and generating reports."
},
{
"role": "user",
"content": "Can you help me create a monthly sales report template with charts for our Q4 review? I need to include revenue trends, customer acquisition metrics, and regional performance comparisons."
}
],
"extra": {
"user": {
"tier": "standard"
}
},
"router": { /* same router config */ }
}
```
_This request will match `standard_user_cost_optimized` route and use cost-effective models._
### Scenario 2: Internal Development Testing
````json
{
"model": "router/dynamic",
"messages": [
{
"role": "system",
"content": "You are a code review assistant that helps identify potential bugs and suggests improvements."
},
{
"role": "user",
"content": "Please review this TypeScript function for potential issues:\n\n```typescript\nfunction processUserData(users: any[]) {\n return users.map(u => {\n return {\n id: u.id,\n name: u.firstName + ' ' + u.lastName,\n email: u.email?.toLowerCase()\n }\n })\n}\n```"
}
]
// Note: metadata.group_name = "development" is automatically set by LangDB
// based on the user's cost group assignment - no need to specify manually
}
````
_This request will match `internal_dev_testing` route because the LangDB user belongs to the "development" cost group, automatically setting `metadata.group_name = "development"`._
### Scenario 3: Fallback Route
```json
{
"model": "router/dynamic",
"messages": [
{
"role": "user",
"content": "What's the weather like today?"
}
],
"extra": {
"user": {
"tier": "basic"
}
},
"router": { /* same router config */ }
}
```
_This request doesn't match any specific conditions and will use the `fallback_route` with GPT-4o-mini._
---
## Example: Routing with Interceptors and Compliance
This example showcases a sophisticated routing configuration that uses pre-request interceptors to enforce usage quotas and guardrails, while handling region-specific compliance and prioritizing performance for premium users.
**Goals:**
1. Enforce a daily rate limit on all users to prevent abuse.
2. Check all requests for policy violations using a semantic guardrail.
3. Provide high-performance models for premium users in the EU, but only if they are reliable.
4. Ensure GDPR compliance by using a specialized model for requests with that requirement.
5. Provide a clear error message to users who have exceeded their quota.
**Complete Chat Completion Request with Interceptors and Compliance:**
```json
{
"model": "router/dynamic",
"messages": [
{
"role": "system",
"content": "You are a financial advisory assistant that provides investment recommendations and portfolio analysis for premium clients in the European market."
},
{
"role": "user",
"content": "I need urgent advice on my cryptocurrency portfolio diversification strategy. Given the current market volatility, should I reallocate my €2M investment across different digital assets? Please provide detailed analysis including tax implications under GDPR compliance requirements."
}
],
"stream": true,
"extra": {
"user": {
"tier": "premium"
}
},
"router": {
"type": "conditional",
"pre_request": [
{
"name": "rate_limiter",
"type": "interceptor",
"limit": 1000,
"period": "day",
"target": "user_id"
},
{
"name": "semantic_guardrail",
"type": "guardrail"
}
],
"routes": [
{
"name": "premium_eu_high_performance",
"conditions": {
"all": [
{
"extra.user.tier": {
"$eq": "premium"
}
},
{
"metadata.region": {
"$eq": "EU"
}
},
{
"pre_request.rate_limiter.passed": {
"$eq": true
}
}
]
},
"targets": {
"$any": [
"anthropic/claude-4-opus",
"openai/gpt-o3",
"gemini/gemini-2.5-pro"
],
"filter": {
"error_rate": {
"$lt": 0.02
}
},
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "gdpr_compliance_fallback",
"conditions": {
"metadata.tags.gdpr": {
"$eq": "true"
}
},
"targets": "eu-specialist/gdpr-compliant-model"
},
{
"name": "semantic_guardrail_violation_block",
"conditions": {
"pre_request.semantic_guardrail.passed": {
"$eq": false
}
},
"message_mapper": {
"modifier": "block",
"content": "Your request contains content that violates our usage policies. Please review your message and try again with appropriate content."
}
},
{
"name": "rate_limit_exceeded_block",
"conditions": {
"pre_request.rate_limiter.passed": {
"$eq": false
}
},
"message_mapper": {
"modifier": "block",
"content": "You have exceeded your daily quota of 1,000 requests. Please try again tomorrow or upgrade to enterprise tier for higher limits."
}
},
{
"name": "fallback_route",
"conditions": {
"all": [
{
"pre_request.rate_limiter.passed": {
"$eq": true
}
},
{
"pre_request.semantic_guardrail.passed": {
"$eq": true
}
}
]
},
"targets": "openai/gpt-4o-mini"
}
],
"max_retries": 2
}
}
```
#### **Configuration Breakdown:**
* **Request Structure**:
* Uses `"model": "router/dynamic"` to enable dynamic routing
* User information and compliance requirements are passed via the `extra.user` object
* Router configuration includes `pre_request` interceptors and conditional routing
* **Pre-Request Interceptors**: Before routing evaluation, two interceptors run:
* `rate_limiter`: Enforces 1,000 requests per day limit per user. Results are accessible via `pre_request.rate_limiter.passed`
* `semantic_guardrail`: Scans content for policy violations. Results are accessible via `pre_request.semantic_guardrail.passed` (custom guardrail implementation required)
* **Route 1: `premium_eu_high_performance`**
* **Conditions**: Requires premium tier (`extra.user.tier`), EU region (`metadata.region`), rate limit not exceeded, and low provider error rate
* **Targets**: Routes to high-performance models with error rate filtering and minimum time-to-first-token sorting
* **Route 2: `gdpr_compliance_fallback`**
* **Conditions**: Checks for GDPR compliance requirement via `metadata.tags.gdpr` (automatically set by LangDB from x-tags header)
* **Targets**: Routes to specialized EU GDPR-compliant model regardless of user tier
* **Route 3: `semantic_guardrail_violation_block`**
* **Conditions**: Triggers when semantic guardrail fails (`pre_request.semantic_guardrail.passed = false`)
* **Action**: Uses `message_mapper` to block request and return policy violation error message
* **Route 4: `rate_limit_exceeded_block`**
* **Conditions**: Triggers when rate limiter fails (`pre_request.rate_limiter.passed = false`)
* **Action**: Uses `message_mapper` to block request and return informative error message with upgrade suggestion
* **Route 5: `fallback_route`**
* **Conditions**: Only applies to requests that pass both rate limiting and semantic guardrail checks
* **Targets**: Routes to `openai/gpt-4o-mini` as reliable fallback for valid requests that don't match other specific conditions
**Key Features Demonstrated:**
* **Pre-Request Processing**: Rate limiting and content guardrails before routing
* **Compliance Routing**: Automatic GDPR-compliant model selection
* **Error Handling**: Custom error messages for quota exceeded scenarios
* **Performance Optimization**: Provider health checks and fast response prioritization
* **Regional Awareness**: EU-specific routing based on `metadata.region` (automatically set by LangDB based on user location)
---
## Examples
This directory contains focused, practical examples of LangDB's routing capabilities. Each example demonstrates a specific routing pattern that you can adapt for your use case.
## Quick Examples
| Example | Description | Use Case |
|---------|-------------|----------|
| [User Tier Routing](user-tier-routing.md) | Route users to different models based on their subscription tier | SaaS applications with premium/standard tiers |
| [Rate Limiter Routing](rate-limiter-routing.md) | Enforce usage quotas and route accordingly | Prevent abuse, manage costs |
| [Topic Routing](topic-routing.md) | Route to specialized models based on content topic | Content-specific optimization |
| [Load Balancer Routing](load-balancer-routing.md) | Distribute requests across multiple models | High availability, performance |
| [Group Based Routing](group-based-routing.md) | Route based on LangDB cost groups for teams/departments | Team budgets, dev vs prod separation |
| [Region Based Routing](region-based-routing.md) | Route based on user's geographic region | Data residency, performance optimization |
## How to Use These Examples
1. **Copy the JSON configuration** from any example
2. **Modify the conditions and targets** to match your requirements
3. **Test with your specific use case**
4. **Combine multiple patterns** for complex routing scenarios
## Advanced Examples
For more complex, multi-layered routing configurations, see:
Building an Enterprise Routing Configuration
End-to-end example of a multi-layer enterprise routing setup with tiering, cost and fallbacks.
Routing with Interceptors and Compliance
Example showing rate limiting, semantic guardrails, GDPR routing, and error handling.
---
## Group Based Routing
Route requests to different models based on the user's LangDB cost group. This allows organizations to provide different model access levels for different teams or departments.
## Use Case
* Different model budgets per team
* Development vs production environment separation
* Cost control by department
* Team-specific model access
## Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"routes": [
{
"name": "development_group",
"conditions": {
"all": [
{
"metadata.group_name": {
"$eq": "development"
}
}
]
},
"targets": "openai/gpt-5"
},
{
"name": "testing_group",
"conditions": {
"all": [
{
"metadata.group_name": {
"$eq": "testing"
}
}
]
},
"targets": "openai/gpt-4o-mini"
},
{
"name": "production_group",
"conditions": {
"all": [
{
"metadata.group_name": {
"$eq": "production"
}
}
]
},
"targets": {
"$any": [
"gemini/gemini-2.5-pro",
"xai/grok-4"
],
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "executive_group",
"conditions": {
"all": [
{
"metadata.group_name": {
"$eq": "executive"
}
}
]
},
"targets": "anthropic/claude-sonnet-4"
},
{
"name": "fallback_group",
"conditions": {
"all": []
},
"targets": "openai/gpt-4.1-nano"
}
]
}
}
```
## How It Works
1. **Development Group**: Gets budget model (GPT-5) for experimentation
2. **Testing Group**: Gets mid-tier model (GPT-4o-mini) for testing workflows
3. **Production Group**: Gets high-performance models (Gemini-2.5-pro or Grok-4) sorted by speed
4. **Executive Group**: Gets premium model (Claude-Sonnet-4) for critical business us
## Variables Used
* `metadata.group_name`: LangDB cost group name (automatically set by LangDB based on user's group assignment)
## Group Setup
LangDB cost groups are a built-in feature for organizing users and controlling spending. For detailed information about cost groups, see the [Cost Control documentation](https://app.gitbook.com/s/0peUeEHQMLSxYq8BpQGt/features/cost-control).
## Customization
* Add more cost groups for different teams
* Include multiple models per group with filtering
* Combine with user tier routing for more granular control
* Add cost-based sorting within groups
---
## Load Balancer Routing
Distribute requests across multiple models to ensure high availability, balance load, and optimize performance. Use real-time metrics to select the best available model.
## Use Case
- High availability requirements
- Load distribution across models
- Performance optimization
- Failover scenarios
## Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"routes": [
{
"name": "Balanced",
"targets": {
"$any": [
"openai/gpt-4.1-nano",
"gemini/gemini-2.0-flash",
"bedrock/llama3-2-3b-instruct-v1.0"
],
"sort_by": "requests",
"sort_order": "min"
}
}
]
}
}
```
## How It Works
1. **Model Pool**: Defines three models for load distribution (GPT-4.1-nano, Gemini-2.0-flash, Llama3-2-3b)
2. **Load Balancing**: Automatically selects the model with the least current load (requests)
3. **Automatic Distribution**: Requests are distributed across the available models based on their current usage
## Variables Used
- `requests`: Current load metric (used for sorting)
## Customization
- Adjust health thresholds
- Add more models to the pool
- Use different sorting strategies (ttft, price, etc.)
- Implement weighted load balancing
- Add geographic considerations
---
## Message Mapping
Use message mappers to block requests or provide custom responses based on routing conditions. This is useful for error handling, rate limiting, and content moderation.
## Use Case
* Block inappropriate requests
* Provide custom error messages
* Handle rate limit exceeded scenarios
* Return maintenance messages
## Basic Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"pre_request": [
{
"name": "rate_limiter",
"type": "interceptor",
"limit": 10,
"period": "hour",
"target": "user_id"
}
],
"routes": [
{
"name": "rate_limit_exceeded",
"conditions": {
"all": [
{
"pre_request.rate_limiter.passed": {
"$eq": false
}
}
]
},
"message_mapper": {
"modifier": "block",
"content": "You have exceeded your hourly limit of 10 requests. Please try again later."
}
},
{
"name": "premium_user_only",
"conditions": {
"all": [
{
"extra.user.tier": {
"$neq": "premium"
}
}
]
},
"message_mapper": {
"modifier": "block",
"content": "This feature is only available to premium users. Please upgrade your account to access advanced AI models."
}
},
{
"name": "allowed_requests",
"conditions": {
"all": [
{
"pre_request.rate_limiter.passed": {
"$eq": true
}
},
{
"extra.user.tier": {
"$eq": "premium"
}
}
]
},
"targets": "openai/gpt-4o"
},
{
"name": "fallback_message",
"conditions": {
"all": []
},
"message_mapper": {
"modifier": "block",
"content": "Service temporarily unavailable. Please check your account status and try again."
}
}
]
}
}
```
### How It Works
1. **Rate Limit Block**: If user exceeds 10 requests per hour, block with helpful message
2. **Premium Only Block**: Block non-premium users with upgrade message
3. **Allowed Requests**: Route premium users within limits to GPT-4o
4. **Fallback Block**: Block all other requests with generic message
## Message Mapper Properties
* **`modifier`**: Must be `"block"` to block the request
* **`content`**: The message to return to the user instead of an AI response
## Variables Used
* `pre_request.rate_limiter.passed`: Rate limiter result (from interceptor)
* `extra.user.tier`: User subscription tier (from request)
## Customization
* Add different messages for different user tiers
* Include helpful links or contact information in blocked messages
* Combine with guardrails for content-based blocking
* Add maintenance window messages
---
## Rate Limiter Routing
Enforce usage quotas and route requests accordingly. Block users who exceed their limits while providing clear error messages.
## Use Case
- Prevent API abuse
- Manage costs with usage quotas
- Provide clear feedback on rate limits
## Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"pre_request": [
{
"name": "daily_user_limit",
"type": "rate_limiter",
"limit": 2,
"period": "hour",
"entity": "user_id",
"target": "requests"
}
],
"routes": [
{
"conditions": {
"all": [
{
"pre_request.daily_user_limit.allowed": {
"$eq": true
}
}
]
},
"name": "main_requests",
"targets": {
"$any": [
"anthropic/*"
],
"sort_by": "price",
"sort_order": "min"
}
},
{
"name": "other_requests",
"targets": {
"$any": [
"openai/gpt-4.1-nano"
],
"sort_by": "price",
"sort_order": "min"
}
}
]
}
}
```
## How It Works
1. **Rate Limiter**: Checks if user has exceeded 2 requests per hour
2. **Main Requests**: Routes allowed requests to any Anthropic model, sorted by lowest price
3. **Other Requests**: Routes remaining requests to GPT-4.1-nano for cost optimization
## Variables Used
- `pre_request.daily_user_limit.allowed`: Result of rate limit check (true if within limit)
## Customization
- Adjust limits (daily, weekly, monthly)
- Different limits per user tier
- Custom error messages
- Graceful degradation instead of blocking
---
## Region Based Routing
Route requests to different models based on the user's geographic location. This helps with data residency requirements, regulatory compliance, and performance optimization by selecting geographically closer providers.
## Use Case
* Data residency requirements
* Regional performance optimization
* Regulatory compliance (GDPR, data sovereignty)
* Provider availability by region
## Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"routes": [
{
"name": "european_users",
"conditions": {
"all": [
{
"metadata.region": {
"$eq": "Europe"
}
}
]
},
"targets": {
"$any": [
"openai/gpt-5",
"anthropic/claude-opus-4.1"
],
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "north_american_users",
"conditions": {
"all": [
{
"metadata.country": {
"$in": ["US", "CA", "MX"]
}
}
]
},
"targets": {
"$any": [
"openai/gpt-4.1",
"google/gemini-2.5-pro"
],
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "asian_users",
"conditions": {
"all": [
{
"metadata.region": {
"$eq": "Asia"
}
}
]
},
"targets": {
"$any": [
"deepseek/deepseek-chat",
"kimi-k2"
],
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "global_fallback",
"conditions": {
"all": []
},
"targets": "openai/gpt-4.1-mini"
}
]
}
}
```
### How it works
1. **European Users** (using `metadata.region = "Europe"`): Get GPT-5 or Claude-Opus-4.1 models optimized for speed
2. **North American Users** (using specific country codes US, CA, MX): Get GPT-4.1 or Gemini-2.5-pro models optimized for speed
3. **Asian Users** (using `metadata.region = "Asia"`): Get DeepSeek-chat or Kimi-k2 models optimized for speed
4. **Global Fallback**: Any unmatched regions/countries get GPT-4.1-mini
## Variables Used
* `metadata.region`: User's geographic region (automatically detected by LangDB based on IP address)
* `metadata.country`: User's country code (automatically detected by LangDB based on IP address)
## Geographic Detection
LangDB automatically detects user location based on IP address geolocation and provides both:
* **Regions**: Broad geographic areas like `"Europe"`, `"Asia"`, `"North America"`, etc.
* **Countries**: ISO 3166-1 alpha-2 country codes like `"US"`, `"GB"`, `"DE"`, `"CN"`, `"JP"`, etc.
* No additional setup required - works out of the box
## Customization
* **Region-based**: Use `metadata.region` for broad geographic routing (`"Europe"`, `"Asia"`, `"Africa"`, etc.)
* **Country-specific**: Use `metadata.country` for precise country targeting (`"US"`, `"DE"`, `"JP"`, etc.)
* **Mixed approach**: Combine both region and country routing in the same configuration
* Include region-specific model providers (e.g., local AI providers)
* Add failover logic for regional outages
---
## Topic Routing
Route requests to specialized models based on the detected topic of the conversation. Use semantic analysis to match content with the most appropriate model.
## Use Case
- Content-specific optimization
- Specialized model selection
- Domain expertise routing
## Prerequisites
Before using this routing configuration, you need to create a **Topic Adherence Guardrail** in your LangDB dashboard:
1. Go to **Guardrails** section in your LangDB dashboard
2. Create a new guardrail with type "Topic Adherence"
3. Configure the topics you want to classify (e.g., FAQ, support, technical, etc.)
4. Copy the generated guardrail slug
5. Replace `{{topic_guard_slug}}` in the configuration below with your actual guardrail slug
## Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"pre_request": [
{
"name": "topic_guard",
"type": "guard",
"guard_id": "{{topic_guard_slug}}"
}
],
"routes": [
{
"conditions": {
"all": [
{
"pre_request.topic_guard.passed": {
"$eq": true
}
}
]
},
"name": "faq_route",
"targets": {
"$any": [
"openai/gpt-4.1",
"openai/gpt-4.1-nano",
"openai/gpt-4.1-mini"
],
"sort_by": "price",
"sort_order": "min"
}
},
{
"conditions": {},
"name": "fallback_route",
"targets": {
"$any": [
"gemini/*"
],
"sort_by": "ttft",
"sort_order": "min"
}
}
]
}
}
```
## How It Works
1. **Topic Guard**: Uses a custom guard to classify the request content
2. **FAQ Route**: Routes topic-classified requests to cost-optimized OpenAI models (GPT-4.1 variants)
3. **Fallback Route**: Routes all other requests to any Gemini model, sorted by fastest response time
## Variables Used
- `pre_request.topic_guard.passed`: Topic guard classification result
## Customization
- Add more topic categories
- Include specialized models for different domains
- Adjust topic detection thresholds
- Combine with user tier routing
---
## User Tier Routing
Route users to different models based on their subscription tier. Premium users get access to high-performance models, while standard users use cost-effective alternatives.
## Use Case
* SaaS applications with premium/standard tiers
* Different SLA requirements per user tier
* Cost optimization based on user value
## Configuration
```json
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"routes": [
{
"conditions": {
"all": [
{
"extra.user.tier": {
"$eq": "premium"
}
}
]
},
"name": "premium_user",
"targets": {
"$any": [
"openai/gpt-4.1-mini",
"xai/grok-4",
"anthropic/claude-sonnet-4"
],
"filter": {
"error_rate": {
"$lt": 0.01
}
},
"sort_by": "ttft",
"sort_order": "min"
}
},
{
"name": "basic_user",
"targets": "openai/gpt-4.1-nano"
}
]
}
}
```
## How It Works
1. **Premium Users**: Get access to high-performance models (GPT-4.1-mini, Grok-4, Claude-Sonnet-4) with error rate filtering (< 1%) and sorted by fastest response time
2. **Basic Users**: Use the default GPT-4.1-nano model for all requests
## Variables Used
* `extra.user.tier`: User subscription tier from request
## Customization
* Add more tiers (e.g., "enterprise", "basic")
* Include different model pools for each tier
* Add performance requirements per tier
---
## Interceptors & Guardrails
### Interceptors and Guardrails
Interceptors are custom logic that can run before or after a request is routed, allowing you to enrich, validate, or transform requests and responses. Guardrails are a common type of interceptor used to enforce policies.
| Type | Purpose | Business Value | Example Configuration |
| -------------- | -------------------------- | ---------------------------------------------------- | ------------------------------------------ |
| `pre_request` | Analyze or enrich request | Classify topic, check for risk, personalize | `rate_limiter`, `semantic_guardrail`, `toxicity_guardrail` |
| `post_request` | Analyze or modify response | Moderate output, add fallback, redact sensitive info | `fallback_response`, `content_moderator` |
### Pre-Request Interceptor Configuration
Pre-request interceptors are defined in the `routing.pre_request` array and run before routing evaluation:
```json
{
"routing": {
"pre_request": [
{
"name": "rate_limiter",
"type": "interceptor",
"limit": 1000,
"period": "day",
"target": "user_id"
},
{
"name": "semantic_guardrail",
"type": "guardrail"
}
]
}
}
```
### Interceptor Result Variables
When an interceptor runs, it injects its results into the routing context using the `pre_request.{interceptor_name}` namespace:
| Result Variable Path | Description | Example Value | Business Use |
| ----------------------------------------- | ------------------------------------ | ------------- | -------------------------------------- |
| `pre_request.{interceptor_name}.passed` | Whether the interceptor check passed | `true` | Conditional routing based on checks |
| `pre_request.rate_limiter.passed` | Rate limit check result | `false` | Enforce usage quotas and prevent abuse |
| `pre_request.semantic_guardrail.passed` | Content policy check result | `true` | Block policy violations |
| `pre_request.toxicity_guardrail.passed` | Toxicity check result | `false` | Block harmful content |
| `pre_request.{guardrail}.result.topic` | Detected topic from semantic analysis | `"billing"` | Route to topic-specialized models |
| `pre_request.{guardrail}.result.score` | Confidence or toxicity score | `0.8` | Threshold-based routing decisions |
| `pre_request.{guardrail}.result.*` | Any custom result from interceptor | Various | Custom business logic |
### Common Interceptor Types
| Interceptor Type | Purpose | Configuration Fields | Result Fields |
| --------------------- | ------------------------------------ | --------------------------------------- | -------------------------------------- |
| `rate_limiter` | Enforce request quotas | `limit`, `period`, `target` | `passed` |
| `semantic_guardrail` | Content classification and filtering | Custom guardrail configuration | `passed`, `result.topic`, `result.*` |
| `toxicity_guardrail` | Detect harmful content | Custom guardrail configuration | `passed`, `result.score` |
| `custom_interceptor` | Business-specific logic | Varies by implementation | `passed`, `result.*` |
### Usage in Routing Conditions
Use interceptor results in your routing conditions:
```json
{
"conditions": {
"all": [
{ "pre_request.rate_limiter.passed": { "$eq": true } },
{ "pre_request.semantic_guardrail.passed": { "$eq": true } },
{ "pre_request.semantic_guardrail.result.topic": { "$eq": "support" } }
]
}
}
```
### Error Handling with Message Mappers
Block requests that fail interceptor checks:
```json
{
"name": "rate_limit_block",
"conditions": {
"pre_request.rate_limiter.passed": { "$eq": false }
},
"message_mapper": {
"modifier": "block",
"content": "Rate limit exceeded. Please try again later."
}
}
```
> **Note on Custom Guardrails**: Guardrails like `semantic_guardrail` and `toxicity_guardrail` are powerful examples of custom guardrails. Check out the [Guardrails](https://app.gitbook.com/s/0peUeEHQMLSxYq8BpQGt/features/guardrails) section for implementation details and configuration options.
---
## Variables & Functions
### Available Metrics
LangDB provides a rich set of real-time metrics for making dynamic, data-driven routing decisions. These metrics are aggregated at the provider level, giving you a live view of model performance.
| Metric Name | Description | Example Value | Business Value |
| --------------- | ------------------------------------------ | ------------- | ------------------------------------- |
| `ttft` | Time to First Token (ms) | 450 | Optimize for user-perceived speed |
| `tps` | Tokens Per Second (output\_tokens/latency) | 300 | Measure model generation speed |
| `error_rate` | Fraction of failed requests | 0.01 | Route around unreliable models |
| `latency` | Average end-to-end response time (ms) | 1100 | Route based on overall performance |
| `requests` | Total number of requests processed | 1500 | Monitor traffic and usage patterns |
| `cost` | Estimated request cost (USD) | 0.02 | Track and control spend in real time |
### Available Variables
Variables provide contextual information from the incoming request and user metadata. Unlike metrics, they are not performance indicators but are essential for conditional logic.
### User Information
| Variable Path | Description | Example Value | Business Use |
| ---------------------------- | ------------------------------------ | ---------------------- | ----------------------------- |
| `extra.user.id` | User identifier from request | `"u-12345"` | Custom user identification |
| `extra.user.name` | User name from request | `"john.doe"` | Personalization |
| `extra.user.email` | User email from request | `"john@example.com"` | User-based routing |
| `extra.user.tier` | User tier from request | `"premium"` | SLA-based routing |
| `extra.user.tiers` | Array of user tiers from request | `["premium", "beta"]` | Multi-tier access control |
| `extra.user.*` | Any custom field in extra.user | Various | Custom business logic |
### Request Metadata
| Variable Path | Description | Example Value | Business Use |
| ---------------------------- | ------------------------------------ | ---------------------- | ----------------------------- |
| `metadata.ip` | IP address of the requester | `"203.0.113.42"` | Geo-fencing, fraud detection |
| `metadata.region` | Geographical region (based on IP) | `"Europe"` | Data residency, compliance |
| `metadata.country` | Country code (based on IP) | `"DE"` | Country-specific routing |
| `metadata.user_agent` | Client application | `"Google ADK/ CrewAI"` | Agentic library detection |
| `metadata.user_id` | LangDB user ID | `"langdb-u-12345"` | Internal user identification |
| `metadata.group_id` | LangDB cost group ID | `"grp-456"` | Cost group identification |
| `metadata.group_name` | LangDB cost group name | `"development"` | Cost group-based routing |
| `metadata.tags.{tag}` | LangDB user tags | `"premium"` | User classification |
### Pre-Request Interceptor Results
| Variable Path | Description | Example Value | Business Use |
| ------------------------------------------ | ------------------------------------ | ------------- | -------------------------------------- |
| `pre_request.{interceptor_name}.passed` | Whether interceptor check passed | `true` | Conditional routing based on checks |
| `pre_request.rate_limiter.passed` | Rate limit check result | `true` | Enforce usage quotas |
| `pre_request.semantic_guardrail.passed` | Content policy check result | `false` | Block policy violations |
| `pre_request.{guardrail}.result.topic` | Detected topic from guardrail | `"billing"` | Topic-based model routing |
| `pre_request.{guardrail}.result.*` | Any custom result from interceptor | Various | Custom business logic |
### Provider Performance Metrics (for Filtering)
| Variable Path | Description | Example Value | Business Use |
| --------------------- | --------------------------------- | ------------- | ---------------------------- |
| `provider.error_rate` | Provider's current error rate | `0.02` | Avoid unreliable providers |
| `model_capabilities` | Supported features (vision, code) | `["vision"]` | Feature-based routing |
| `model_tags` | Model classification tags | `["fast"]` | Tag-based filtering |
### Target Configuration
| Configuration Type | Description | Example Usage | When to Use |
| --------------------------- | ------------------------------------------------ | --------------------------------------- | -------------------------------- |
| **Simple List** | Direct array of model names | `["openai/gpt-4o", "anthropic/claude-4-sonnet"]` | Fixed model selection |
| **Provider Wildcards** | All models from specific provider | `["openai/*", "anthropic/*"]` | Provider-level routing |
| **Single Target** | Direct model specification | `"openai/gpt-4o"` | Fixed routing to one model |
| **$any Pool** | Pool of models with selection logic | `{ "$any": [...], "sort_by": "price" }` | Dynamic model selection |
### Model Target Formats
| Format | Description | Example | Use Case |
| --------------------------- | ------------------------------------------------ | --------------------------------------- | -------------------------------- |
| **Exact Model Name** | Specific model identifier | `"openai/gpt-4o"` | Direct routing |
| **Provider Wildcard** | All models from provider | `"openai/*"` | Provider-level failover |
### Sorting Options
| Sort Configuration | What It Does | When to Use | Example JSON Usage |
| --------------------------- | ------------------------------------------------ | --------------------------------------- | --------------------------------- |
| `"sort_by": "price", "sort_order": "min"` | Picks the cheapest model | Cost control, bulk/low-priority tasks | `"sort_by": "price", "sort_order": "min"` |
| `"sort_by": "price", "sort_order": "max"` | Picks the most expensive model | Premium users, highest quality needed | `"sort_by": "price", "sort_order": "max"` |
| `"sort_by": "ttft", "sort_order": "min"` | Picks the fastest model (time-to-first-token) | VIP, real-time, or user-facing tasks | `"sort_by": "ttft", "sort_order": "min"` |
| `"sort_by": "latency", "sort_order": "min"` | Picks the lowest latency model | Performance-critical applications | `"sort_by": "latency", "sort_order": "min"` |
| `"sort_by": "error_rate", "sort_order": "min"` | Picks the most reliable model | Mission-critical or regulated workflows | `"sort_by": "error_rate", "sort_order": "min"` |
| `"sort_by": "tps", "sort_order": "max"` | Picks the fastest token generation model | Bulk generation, high throughput | `"sort_by": "tps", "sort_order": "max"` |
| `"sort_by": "requests", "sort_order": "min"` | Picks the least loaded model | Load balancing | `"sort_by": "requests", "sort_order": "min"` |
### Complete Target Examples
#### Simple Array Targets
```json
{
"targets": ["openai/gpt-4o", "anthropic/claude-4-sonnet"]
}
```
#### Provider Wildcard Targets
```json
{
"targets": ["openai/*", "anthropic/*"]
}
```
#### Single Target
```json
{
"targets": "openai/gpt-4o"
}
```
#### Advanced $any Pool with Filtering and Sorting
```json
{
"targets": {
"$any": [
"openai/*",
"anthropic/claude-4-sonnet",
"mistral/mistral-large-latest"
],
"filter": {
"ttft": { "$lt": 1000 },
"provider.error_rate": { "$lt": 0.05 }
},
"sort_by": "price",
"sort_order": "min"
}
}
```
### Filter Options
| Filter Type | Description | Operators Supported | Example Usage |
| --------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- |
| `filter: { ttft: {...} }` | Filter models by time-to-first-token | `$lt`, `$gt`, `$lte`, `$gte`, `$eq` | `"ttft": { "$lt": 1000 }` |
| `filter: { price: {...} }` | Filter models by cost | `$lt`, `$gt`, `$lte`, `$gte`, `$eq` | `"price": { "$lt": 0.02 }` |
| `filter: { error_rate: {...} }` | Filter models by reliability | `$lt`, `$gt`, `$lte`, `$gte`, `$eq` | `"error_rate": { "$lt": 0.05 }` |
| `filter: { provider.error_rate: {...} }` | Filter by provider reliability | `$lt`, `$gt`, `$lte`, `$gte`, `$eq` | `"provider.error_rate": { "$lt": 0.02 }` |
| `filter: { model_tags: {...} }` | Filter by model classification tags | `$contains`, `$in` | `"model_tags": { "$contains": "fast" }` |
### Comparison Operators
| Operator | Description | Example Usage | Use Case |
| ------------- | ------------------------------ | ------------------------------------ | -------------------------------- |
| `$eq` | Equals | `"tier": { "$eq": "premium" }` | Exact matching |
| `$ne` | Not equals | `"tier": { "$ne": "free" }` | Exclusion logic |
| `$lt` | Less than | `"ttft": { "$lt": 1000 }` | Performance thresholds |
| `$lte` | Less than or equal | `"price": { "$lte": 0.02 }` | Budget constraints |
| `$gt` | Greater than | `"tps": { "$gt": 100 }` | Minimum performance requirements |
| `$gte` | Greater than or equal | `"reliability": { "$gte": 0.95 }` | Quality standards |
| `$in` | In array | `"region": { "$in": ["US", "EU"] }` | Multiple value matching |
| `$contains` | Array contains value | `"tags": { "$contains": "GDPR" }` | Tag-based filtering |
> **Important Notes:**
> - **Targets**: Can be a simple array, single string, or `$any` object with filtering/sorting
> - **Wildcards**: Use `*` for provider-level routing (e.g., `"openai/*"` = all OpenAI models)
> - **Sorting**: Use `"sort_by"` and `"sort_order"` fields separately
> - **Operators**: Always use `$` prefix for comparison operators (`$eq`, `$lt`, etc.)
> - **Values**: Use lowercase for sort order (`"min"`, `"max"`)
---
## Deploying on AWS Cloud
## AWS Deployment
This section describes how AI Gateway and its supporting components on AWS, ensuring enterprise-grade scalability, observability, and security.
### Software Components
| Component | Purpose / Description | AWS Service | Scaling |
| ---------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| **LLM Gateway** | Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing. | Amazon ECS (Elastic Container Service) | ECS Auto Scaling based on CPU/memory or custom CloudWatch metrics. |
| **Metadata Store (PostgreSQL)** | Stores metadata related to API usage, configurations, and more. | Amazon RDS (PostgreSQL) | Vertical scaling (instance size), Multi-AZ support.Read replicas can be configured for better read performance. |
| Cache Store (Redis) | Implements rolling cost control and rate limiting for API usage. | Amazon ElastiCache (Redis) | Scale by adding shards/replicas, Multi-AZ support. |
| **Observability & Analytics Store (ClickHouse)** | Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry. | ClickHouse Cloud (external) | Scales independently; ensure sufficient network throughput for trace/log ingestion. |
| **Load Balancing** | Distributes incoming traffic to ECS tasks, enabling high availability and SSL termination. | Amazon ALB (Application Load Balancer) | Scales automatically to handle incoming traffic; supports multi-AZ deployments. |
### Architecture Overview
The LangDB service is deployed in AWS with a robust, scalable architecture designed for high availability, security, and performance. The system is built using AWS managed services to minimize operational overhead while maintaining full control over the application.
### Components
#### Networking and Entry Points
* **AWS Region**: All resources are deployed within a single AWS region for low-latency communication
* **VPC**: A dedicated Virtual Private Cloud isolates the application resources
* **ALB**: Application Load Balancer serves as the entry point, routing requests from `https://api.{region}.langdb.ai` to the appropriate services
#### Core Services
* **ECS Cluster**: Container orchestration for the LangDB service
* Multiple LangDB service instances distributed across availability zones for redundancy
* Auto-scaling capabilities based on load
* Containerized deployment for consistency across environments
#### Data Storage
* **RDS**: Managed relational database service for persistent storage. Dedicated storage for metadata
* **ElastiCache (Redis) Cluster**: In-memory caching layer
* Used for cache and cost control
* Multiple nodes for high availability
* **Clickhouse Cloud**: Analytics database for high-performance data processing
* Deployed in the same AWS region but outside the VPC
* Managed service for analytical queries and data warehousing
#### Authentication & Security
* **Cognito**: User authentication and identity management
* **Lambda**: Serverless functions for authentication workflows
* **SES**: Simple Email Service for email communications related to authentication
#### Secrets Management
* **Secrets Vault**: AWS Secrets Manager for secure storage of
* Provider keys
* Other sensitive credentials
### Data Flow
1. Client requests hit the ALB via `https://api.{region}.langdb.ai`
2. ALB routes requests to available LangDB service instances in the ECS cluster
3. LangDB services interact with:
* RDS for persistent data
* ElastiCache for caching, cost control and rate limiting
* Metadata Storage for metadata operations
* Clickhouse Cloud for analytics and data warehousing
4. Authentication is handled through Cognito, with Lambda functions for custom authentication flows
5. Sensitive information is securely retrieved from Secrets Manager as needed
### Security Considerations
* All components except Clickhouse Cloud are contained within a VPC for network isolation
* Secure connections to Clickhouse Cloud are established from within the VPC
* Authentication is managed through AWS Cognito
* Secrets are stored in AWS Secrets Manager
* All communication between services uses encryption in transit
### Operational Benefits
* **Scalability**: The architecture supports horizontal scaling of LangDB service instances
* **High Availability**: Multiple instances across availability zones
* **Managed Services**: Leveraging AWS managed services reduces operational overhead
### Deployment Process
LangDB infrastructure is deployed and managed using Terraform, providing infrastructure-as-code capabilities with the following benefits:
#### Terraform Architecture
* **Modular Structure**: The deployment code is organized into reusable Terraform modules that encapsulate specific infrastructure components (networking, compute, storage, etc.)
* **Environment-Specific Variables**: Using `.tfvars` files to manage environment-specific configurations (dev, staging, prod)
* **State Management**: Terraform state is stored remotely to enable collaboration and version control
#### Deployment Workflow
1. **Configuration Management**: Environment-specific variables are defined in `.tfvars` files
2. **Resource Provisioning**: Terraform creates and configures all AWS resources, including:
* VPC and networking components
* Fargate instances and container configurations
* Postgres databases and Redis clusters
* Authentication services and Lambda functions
* Secrets Manager entries and access controls
3. **Dependency Management**: Terraform handles resource dependencies, ensuring proper creation order
#### Maintenance & Updates
Ongoing infrastructure maintenance is managed through Terraform:
* Scaling resources up/down based on demand
* Applying security patches and updates
* Modifying configurations for performance optimization
* Adding new resources or services as needed
---
## Deploying on GCP (Beta)
:::info
Coming Soon
:::
---
## Using Docker Compose
This guide walks through a simple deployment using Docker Compose. For scalable cloud-native deployments, see our [AWS](deploying-on-aws-cloud.md) and [GCP](deploying-on-gcp-beta.md) guides.
## docker-compose.yaml
```yaml
version: '3.8'
services:
ai-gateway:
# We will share be a private image that contains the ai-gateway-enterprise edition.
# For reference, checkout our free image available in our Github repo.
# https://github.com/langdb/ai-gateway
image: /ai-gateway-enterprise:latest
ports:
- "8083:8083"
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
# ai-gateway.yaml is expected in the configuration folder.
- config:/usr/langdb/
container_name: "langdb-ai-gateway"
clickhouse:
image: clickhouse/clickhouse-server:latest
ports:
- "8123:8123"
- "9000:9000"
ulimits:
nofile:
soft: 262144
hard: 262144
extra_hosts:
- "host.docker.internal:host-gateway"
container_name: "langdb-clickhouse"
postgres:
image: postgres:latest
container_name: langdb-cloud-enterprise-pg
environment:
POSTGRES_USER: langdb
# Note: Include your postgres password as specified in ai-gateway.yaml
POSTGRES_PASSWORD: XXXXX
POSTGRES_DB: langdb_staging
ALLOW_IP_RANGE: 0.0.0.0/0
ports:
- "5438:5432"
command: postgres -c 'max_connections=1000'
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:latest
restart: always
ports:
- "6379:6379"
volumes:
- redis_data:/root/redis
environment:
# Note: Include your redis password as specified in ai-gateway.yaml
- REDIS_PASSWORD=XXXXX
- REDIS_PORT=6379
- REDIS_DATABASES=16
volumes:
config:
postgres_data:
redis_data:
```
Refer to [ai-gateway.yaml.md](../running-locally/ai-gateway.yaml.md "mention") for configuring ai-gateway.
## Next Steps
* For cloud-scale deployment, see [AWS guide](deploying-on-aws-cloud.md) and[ GCP guide](deploying-on-gcp-beta.md).
* For full observability and ClickHouse tracing, refer to the [clickhouse-queries.md](../resources/clickhouse-queries.md "mention").
---
## Using Kubernetes (Beta)
:::info
Work in Progress.
:::
This guide walks you through deploying the **LangDB AI Gateway** enterp Refer to individual database links for deploying and scaling clusters for postgres, redis and clickhouse.
Checkout the repository [https://github.com/langdb/helm-chart](https://github.com/langdb/helm-chart)
## Deploy using Helm
### Clone the Repository
```sh
git clone git@github.com:langdb/helm-chart.git
cd helm-chart/helm/ai-gateway
```
This will deploy:
* ai-gateway (using the default image)
* uses External Postgres, Redis, and ClickHouse
### Configure `values.yaml`
```yaml
env:
CLICKHOUSE_HOST:
REDIS_HOST:
POSTGRES_HOST:
POSTGRES_USER:
POSTGRES_PASSWORD:
POSTGRES_DB:
config: |
# ai-gateway configuration
http:
host: "0.0.0.0"
port: 8080
```
> This will automatically mount your config.yaml into the container at `/app/config.yaml` and the ai-gateway will use it on startup.
### Deploy Using Helm
Run the following command to install the AI Gateway:
```bash
helm install ai-gateway .
```
This deploys:
* **ai-gateway** using the default image.
* Connections to external **Postgres**, **Redis**, and **ClickHouse** instances.
### Accessing AI Gateway
By default, the service is exposed as a `ClusterIP`. To access it externally, you can port-forward:
```sh
kubectl port-forward svc/ai-gateway 8080:80
```
Then access the gateway at `http://localhost:8080`.
***
### Uninstall
To remove the deployment:
```sh
helm uninstall ai-gateway
```
***
Check out the full source repository here: [https://github.com/langdb/helm-chart](https://github.com/langdb/helm-chart)
---
## Enterprise Licensing Options
LangDB offers two enterprise licensing models to fit your organization's needs:
***
### Enterprise Managed
**Best for:** Development teams and startups running AI workloads in private VPCs who want a fully managed experience.
* **Deployment:** Entire infrastructure is deployed on GCP or AWS and fully managed by LangDB.
* For AWS, an AWS account with shared access will be provisioned.
* For GCP, a new project will be provisioned.
* **Infrastructure:** Fully Managed.
***
### Enterprise Flexible
**Best for:** Enterprises running large-scale AI operations who want maximum flexibility and control.
* **Deployment:** LangDB provides a highly performant binary that can be deployed in your own infrastructure (on-prem, cloud, or hybrid).
* x86\_64, aarch64 for Ubuntu
* **Infrastructure:** Bring your own
***
**Solutioning Add-On:** Available at an hourly rate if needed.
For details or clarification, please [book a meeting](https://langdb.ai/) on our website.
---
## Clickhouse Queries
## Overview
Clickhouse is used for observability in LangDB. It provides high-performance analytics capabilities that allow us to track and analyze system behavior, performance metrics, and user activities across the platform.
## Table Schemas
## Traces Table
The following create query represents the`traces`table that stores distributed tracing information of Langdb AI Gateway.
```sql
CREATE TABLE IF NOT EXISTS langdb.traces
(
trace_id UUID,
span_id UInt64,
parent_span_id UInt64,
operation_name LowCardinality(String),
kind String,
start_time_us UInt64,
finish_time_us UInt64,
finish_date Date,
attribute Map(String, String),
tenant_id Nullable(String),
project_id String,
thread_id String,
tags Map(String, String),
parent_trace_id Nullable(UUID),
run_id Nullable(UUID)
)
ENGINE = MergeTree
ORDER BY (finish_date, finish_time_us, trace_id)
SETTINGS index_granularity = 8192;
-- Add bloom filter index for thread_id
ALTER TABLE langdb.traces ADD INDEX idx_thread_id thread_id TYPE bloom_filter GRANULARITY 4;
-- Add composite index for tenant_id, project_id, and operation_name
ALTER TABLE langdb.traces ADD INDEX idx_tenant_projec
```
## Common Filters
* `thread_id` field with its dedicated bloom filter index allows for efficient filtering of traces based on specific execution threads.
* The `run_id` field enables filtering and grouping traces by specific execution runs.
---
## Configuring Data Retention
## Overview
This document outlines LangDB's data retention strategy for tracing information stored in ClickHouse. The strategy employs materialized views to manage data retention periods based on user subscription tiers efficiently. Data eviction is implemented using ClickHouse's TTL (Time-To-Live) mechanisms and background processes:
* **TTL Definitions**: Each table includes TTL expressions that specify when data should expire based on timestamp fields
* **Background Merge Process**: ClickHouse automatically runs background processes that merge data parts and remove expired data during these merge operations
* **Resource-Efficient**: The eviction process runs asynchronously during system low-load periods, minimizing impact on query performance
## Tracing Data Architecture
LangDB uses a robust system for storing and analyzing trace data:
* **Primary Storage**: All trace data is initially stored in the `langdb.traces` table in ClickHouse
* **Materialized Views**: Tier-specific materialized views filter and retain data based on user subscription levels
* **Retention Policies**: Automated TTL (Time-To-Live) mechanisms enforce retention periods
## Implementation using Materialized Views
### Tier-Specific Materialized Views
**Professional Tier View**
```sql
CREATE MATERIALIZED VIEW langdb.traces_professional_mv
TO langdb.traces_professional
AS SELECT *
FROM langdb.traces;
CREATE TABLE langdb.traces_professional (
/* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(30);
```
**Enterprise Tier View**
```sql
CREATE MATERIALIZED VIEW langdb.traces_enterprise_mv
TO langdb.traces_enterprise
AS SELECT *
FROM langdb.traces;
CREATE TABLE langdb.traces_enterprise (
/* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(90);
```
## Data Access Flow
1. New trace data is inserted into the base `langdb.traces` table
2. Materialized views automatically filter and copy relevant data to tier-specific tables
3. TTL mechanisms automatically remove data older than the specified retention period
4. Data access APIs query the appropriate table based on the user's subscription tier
## Benefits of This Approach
* **Efficiency**: Only store data for the period necessary based on customer tier
* **Performance**: Queries run against smaller, tier-specific tables rather than the entire dataset
* **Compliance**: Clear retention boundaries help with regulatory compliance
* **Cost-Effective**: Optimizes storage costs by aligning retention with customer value
## Backup and Disaster Recovery
While the retention strategy focuses on operational access to trace data, a separate backup strategy ensures data can be recovered in case of system failures:
* Daily snapshots of ClickHouse data
* Backup retention aligned with the longest tier retention period (365 days)
* Geo-redundant storage of backups
## Monitoring and Management
The retention system includes:
* Monitoring dashboards for data volume by tier
* Alerts for unexpected growth or retention failures
* Regular audits to ensure compliance with retention policies
## Future Enhancements
* Implementation of custom retention periods for specific enterprise customers
* Cold storage options for extended archival needs
* Advanced sampling techniques to retain representative trace data beyond standard periods
---
## Multi Tenancy
This document outlines the multitenancy implementation in LangDB, explaining how data isolation is maintained across different tenants.
### Overview
LangDB implements a robust multitenancy model that ensures complete isolation of tenant data while maintaining efficient resource utilization. This approach is implemented across different data storage systems used in the platform.
### Clickhouse (Observability)
Clickhouse is used for analytics and observability in LangDB. The multitenancy implementation in Clickhouse includes:
#### Custom Role and User for Every Tenant
* Each tenant in LangDB has a dedicated Clickhouse user and role
* These custom roles enforce access permissions specific to the tenant's data
* Authentication and authorization are managed at the tenant level
* Prevents cross-tenant data access even at the database level
#### Row Policy Based Tenant Isolation
* All read operations in Clickhouse are governed by row policies
* Row policies filter data based on the `tenant_name` column
* When a tenant's credentials are used for database access, the row policy automatically restricts results to only that tenant's data
* This provides a zero-trust isolation model where the application doesn't need to include tenant filters
#### Controlled Insert Operations
* All inserts into Clickhouse tables automatically populate the tenant column
* The tenant column is populated based on the authenticated user context
* Direct inserts by tenants are not allowed, preventing potential data integrity issues
* Insert operations are performed via service accounts with appropriate tenant context
### Postgres (Metadata)
Postgres is used as the primary metadata storage in LangDB. The multitenancy implementation in Postgres includes:
#### Application-Level Tenant Isolation
* Tenant isolation is implemented at the application logic level
* All database queries include tenant-specific filters
* Application code ensures that queries only return records belonging to the authenticated tenant
* Modifications are restricted to only the tenant's own data through application context
#### Metadata Security Measures
* Tenant identifier is a required column in all tenant-specific tables
* All database operations include tenant context validation
* Application middleware enforces tenant context for every database operation
### Implementation Across Environments
This multitenancy model is consistently implemented across Langdb's AWS and GCP deployments, ensuring that tenant data remains securely isolated regardless of the cloud provider.
---
## Working with Models
## Managing and Adding New Models
The platform provides a flexible management API that allows you to publish and manage machine learning models at both the tenant and project levels. This enables organizations to control access and visibility of models, supporting both public and private use cases.
Model Listing on LangDB
#### Model Types
* **Public Models:**
* Can be added without specifying a project ID.
* Accessible to all users on this deployment.
* **In enterprise deployments, public models are added monthly.**
* Specific requests for models can be made talking to our support.
* **Private Models:**
* Require a `project_id` and a `provider_id` for a known provider with a pre-configured secret.
* Access is restricted to the specified project and provider.
#### Model Parameters
When publishing a model, you can specify:
* **Request/Response Mapping:**
* By default, models are expected to be OpenAI-compatible.
* You can also specify custom request/response processors using dynamic scripts (see 'Coming Soon' below).
* **Model Parameters Schema:**
* A JSON schema describing the parameters that can be sent with requests to the model.
### Management API to Publish New Models
The management API allows you to register new models with the platform. Below is an example of how to use the API to publish a new model.
#### Sample cURL Request
```bash
curl -X POST https://api.xxx.langdb.ai/admin/models \
-H "Authorization: Bearer " \
-H "X-Admin-Key: "\
-H "Content-Type: application/json" \
-d '{
"model_name": "my-model",
"description" "Description of the LLM Model",
"provider_id": "123e4567-e89b-12d3-a456-426614174000",
"project_id": "",
"public": false,
"request_response_mapping": "openai-compatible", // or custom script
"model_type": "completions",
"input_token_price": "0.00001",
"output_token_price": "0.00003",
"context_size": 128000,
"capabilities": ["tools"],
"input_types": ["text", "image"],
"output_types": ["text", "image"],
"tags": [],
"owner_name": "openai",
"priority": 0,
"model_name_in_provider": "my-model-v1.2",
"parameters": {
"top_k":{
"min":0,
"step":1,
"type":"int",
"default":0,
"required":false,
"description":"Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens."
},
"top_p":{
"max":1,
"min":0,
"step":0.05,
"type":"float",
"default":1,
"required":false,
"description":"An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both."
},
}
}'
```
#### API properties
| Field | Type | Description |
| -------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------- |
| `model_name` | `String` | The display name of the model |
| `description` | `String` | A detailed description of the model's capabilities and use cases |
| `provider_info_id` | `UUID` | The UUID of the provider that offers this model |
| `project_id` | `UUID` | Which project this model belongs to |
| `public` | `Boolean` | Whether the model is publicly discoverable or private |
| `request_response_mapping` | `String` | `"openai-compatible"` or a custom mapping script |
| `model_type` | `String` | The type of model (e.g., `"completions"`, `"image"`, `"embedding"`) |
| `owner_name` | `String` | The name of the model's owner or creator |
| `priority` | `i32` | Priority level for the model in listings (higher numbers indicate higher priority) |
| `input_token_price` | `Nullable float` | Price per input token |
| `output_token_price` | `Nullable float` | Price per output token |
| `context_size` | `Nullable u32` | Maximum context window size in tokens |
| `capabilities` | `String[]` | List of model capabilities (e.g., `"tools"`) |
| `input_types` | `String[]` | Supported input formats (e.g., `"text"`, `"image"`, `"audio"`) |
| `output_types` | `String[]` | Supported output formats (e.g., `"text"`, `"image"`, `"audio"`) |
| `tags` | `String[]` | Classification tags for the model |
| `type_prices` | `Map` | JSON string containing prices for different usage types (used for image generation model pricing) |
| `mp_price` | `Nullable float` | Price by megapixel (used for image generation model pricing) |
| `model_name_in_provider` | `String` | The model's identifier in the provider's system |
| `parameters` | `Map>` | Additional configuration parameters as JSON |
Checkout the full API Specification: [POST /admin/models](https://app.gitbook.com/s/0peUeEHQMLSxYq8BpQGt/api-reference#post-admin-models "mention")\
#### Usage of API
* Replace ``, ``, and `` with your actual values.
* Set `public` to `true` for public models (omit `project_id` and `provider_id`), or `false` for private models.
* The `parameters_schema` field allows you to define the expected parameters for your model.
Setting Parameters for a sample request on LangDB
### Dynamic Request/Response Mapping (Coming Soon)
The platform will soon support dynamic request/response mapping using custom scripts. This feature will allow you to define how requests are transformed before being sent to the model, and how responses are processed before being returned to the client. This will enable support for a wide variety of model APIs and custom workflows.
Stay tuned!
---
## Running Locally
## Dependencies
* **ClickHouse** (for request tracing & analytics)
* **PostgreSQL** (for metadata and user management)
* **Redis** (for caching and rate‑limiting)
## Launch Options
You can self host our enterprise version using two options.
### Using binary
Supported Platforms:
* x86\_64
* aarch64
```bash
ai-gateway-enteprise serve -c ai-gateway.yaml
```
### Using docker
```bash
docker run -it
-p 8080:8080 \
/ai-gateway-enteprise serve \
-c ai-gateway.yaml
```
## Make Your First Request
Test the gateway with a simple chat completion:
```bash
# Chat completion with GPT-4
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'
# Or try Claude
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-opus",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
```
## Using MCPs Servers
Invoke an MCP server alongside your request:
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Ping the server using the tool and return the response"}],
"mcp_servers": [{"server_url": "http://localhost:3004"}]
}'
```
## Next Steps:
Refer to [https://docs.langdb.ai/](https://docs.langdb.ai/) for understanding various features.
---
## ai-gateway.yaml
`ai-gateway.yaml is the primary way to configure secrets and specific features of AI Gateway.`
## # Sample ai-gateway.yaml
```yaml
# Refer to the ai-gateway.yaml for
# advanced configurations available
# ....
# Configuration for storage
database_config:
url: "postgres://langdb:XXXX@localhost:5438/postgres"
redis_config:
url: "redis://localhost:6379"
langdb_clickhouse_config:
url: localhost:8123
rest_api:
port: 8083
host: 0.0.0.0
cors: true
## Configure storage location
# storage_config: !Local "file:///"
langdb_cloud_ui:
url: http://localhost:3000
```
---
## Tenant & User Provisioning
Tenant Provision happens through LangDB dashboard where you can register for a company, upgrade to Enterprise License.
* Reach out to our support staff for configuring your tenant environment using AWS or GCP.
* For self hosted enterprise versions, you ll be requested for discovery URLs to be registered with LangDB Control Environment.
### Tenancy
* The main LangDB Cloud is multi-tenant, with shared infrastructure for all tenants.
* Enterprise deployments are provisioned per tenant, with dedicated infrastructure and network isolation.
* Provisioning an individual tenant involves setting up an entire AWS account or GCP project per tenant, managed via Terraform, which then communicates securely with LangDB Cloud.
### User Setup
Two types of user modes are supported.
### Direct User Setup
* Users can sign up directly and invite additional users to their tenant.
* Easiest to setup.
* You can restrict signups to specific email domains for security.
### Federated User setup( SSO / SAML / OpenID )
* Reach out to us for linking your federated account to your tenant.
* Users that are part of the directory can register on the sub-domain.
* Currently roles are managed through LangDB dashboard.
* Dynamic Role Mapping feature is in **active development.**
---
## Getting Started with Agents
Building Report Writing Agent Using CrewAI
Building Travel Agent with OpenAI
Building Web Search Agent with Google-ADK
Building Travel Concierge with Google ADK
Building a Reasoning Finance Team with Agno
Building Complex Data Extraction with LangGraph
---
## Building a Reasoning Finance Team with Agno
This guide demonstrates how to construct a sophisticated, multi-agent financial analysis team using LangDB. The team collaborates to deliver in-depth insights on publicly traded companies by combining web research and financial data analysis.
Checkout: https://app.langdb.ai/sharing/threads/73c91c58-eab7-4c6b-afe1-5ab6324f1ada
## Code
* LangDB Samples: [https://github.com/langdb/langdb-samples/tree/main/examples/agno/reasoning-finance-team](https://github.com/langdb/langdb-samples/tree/main/examples/agno/reasoning-finance-team)
## Overview
The system is composed of two specialist agents orchestrated by a coordinating team:
1. **Web Search Agent**: Responsible for gathering the latest news and market sentiment from the internet.
2. **Finance Agent**: Equipped with YFinanceTools to fetch and analyze quantitative stock data, including pricing, fundamentals, and analyst recommendations.
3. **Reasoning Finance Team**: A coordinator that directs the two agents, synthesizes their findings, and produces a final, comprehensive report.
LangDB provides the backbone for this system, enabling seamless model access, tool integration, and full observability into each agent's actions and the team's collaborative process.
## Installation
```bash
pip install "pylangdb[agno]" python-dotenv yfinance
```
## Environment Variables
Create a `.env` file or export the following environment variables:
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
## Code Walkthrough
### Initialize LangDB
Start by initializing LangDB to enable automatic tracing and model routing. This should be done before importing any other components from the `agno` library.
```python
import os
from dotenv import load_dotenv
from pylangdb.agno import init
init()
from agno.agent import Agent
# ... other imports
load_dotenv()
```
### Define the Web Search Agent
The `web_agent` is responsible for searching the web. Instead of hard-coding a search tool, we assign it a **LangDB Virtual Model**. This decouples the agent's logic from the specific tools it uses. The virtual model is configured in the LangDB UI to provide search capabilities, as explained in the configuration section below.
```python
web_agent = Agent(
name="Web Search Agent",
role="Search the web for the information",
model=LangDB(id="langdb/search_agent_xmf4v5jk"),
instructions="Always include sources"
)
```
### Define the Finance Agent
This agent is equipped with `YFinanceTools` to access a wide range of financial data. It is powered by Grok-4 and has specific instructions to format its output professionally.
```python
finance_agent = Agent(
name="Finance AI Agent",
role="Analyse the given stock",
model=LangDB(id="xai/grok-4"),
tools=[YFinanceTools(
stock_price=True,
stock_fundamentals=True,
analyst_recommendations=True,
company_info=True,
company_news=True
)],
instructions=[
"Use tables to display stock prices, fundamentals (P/E, Market Cap), and recommendations.",
"Clearly state the company name and ticker symbol.",
"Focus on delivering actionable financial insights."
]
)
```
### Define the Coordinating Team
The `ReasoningFinanceTeam` orchestrates the two agents. It operates in `coordinate` mode, allowing it to delegate tasks, synthesize information, and ensure the final output meets the specified success criteria.
```python
reasoning_finance_team = Team(
name="Reasoning Finance Team",
mode="coordinate",
model=LangDB(id="xai/grok-4"),
members=[web_agent, finance_agent],
instructions=[
"Collaborate to provide comprehensive financial and investment insights",
"Consider both fundamental analysis and market sentiment",
"Use tables and charts to display data clearly and professionally",
"Present findings in a structured, easy-to-follow format",
"Only output the final consolidated analysis, not individual agent responses"
],
markdown=True,
show_members_responses=True,
success_criteria="The team has provided a complete financial analysis with data, visualizations, risk assessment, and actionable investment recommendations supported by quantitative analysis and market research."
)
```
### Configuring Virtual Models and Tools
To empower the `web_agent` with live web search capabilities without hard-coding tools, we configure a **Virtual Model** in LangDB. This model is backed by a **Virtual MCP Server** that provides the actual search functionality.
**Create a Virtual MCP Server**
First, create a dedicated MCP server for the search tool.
1. In the LangDB UI, navigate to **Projects → MCP Servers**.
2. Click **+ New Virtual MCP Server** and configure it:
* **Name**: `web-search-mcp`
* **Underlying MCP**: Select **Tavily Search**.
* **Note**: The Tavily MCP requires an API key. Ensure you have added your `TAVILY_API_KEY` to your LangDB account secrets for the tool to function.
**Create and Configure the Virtual Model**
Next, create a virtual model and attach the MCP you just made.
1. Navigate to **Models → + New Virtual Model**.
2. Give it a name (e.g., `search-agent`).
3. In the **Tools** section, click **+ Attach MCP Server** and select the `web-search-mcp` you created.
4. Save the model and copy its identifier (e.g., `langdb/search-agent_xxxxxx`).
5. Use this identifier as the `model` in your `web_agent` definition.
### Running the Team
To run the team, simply call the `print_response` method with a detailed prompt. The team will autonomously delegate tasks to the appropriate agents and generate a consolidated response.
```python
reasoning_finance_team.print_response(
"""Compare the tech sector giants (AAPL, GOOGL, MSFT) performance:\n 1. Get financial data for all three companies\n 2. Analyze recent news affecting the tech sector\n 3. Calculate comparative metrics and correlations\n 4. Recommend portfolio allocation weights"""
)
```
## Full Trace
Every execution is captured in LangDB, providing a complete trace of the agno team's operations. This includes the initial prompt, each agent's individual contributions, the tools they used, and the final synthesized output.
> Checkout: [https://app.langdb.ai/sharing/threads/73c91c58-eab7-4c6b-afe1-5ab6324f1ada](https://app.langdb.ai/sharing/threads/73c91c58-eab7-4c6b-afe1-5ab6324f1ada) for full conversation
## References
* [LangDB Virtual Models](https://docs.langdb.ai/concepts/virtual-models)
* [Agno Documentation](https://docs.agno.com/)
---
## Building Complex Data Extraction with LangGraph
This guide shows how to build a sophisticated LangGraph agent for extracting structured information from meeting transcripts using LangDB. Leverage LangDB's AI gateway to create multi-stage workflows with confidence scoring, validation loops, and comprehensive tracing.
Checkout: https://app.langdb.ai/sharing/threads/efe57123-9c81-4428-ada1-8d2e55bce4e8
## Code
* LangDB Samples: [https://github.com/langdb/langdb-samples/tree/main/examples/langchain/langchain-data-extraction](https://github.com/langdb/langdb-samples/tree/main/examples/langchain/langchain-data-extraction)
## Overview
The Complex Data Extraction agent processes meeting transcripts through a multi-stage workflow with validation, refinement, and synthesis phases.
### Data Extraction Architecture
The system implements these specialized processing stages:
1. **Preprocessing Node**: Analyzes transcript structure and determines complexity
2. **Initial Extraction Node**: Performs data extraction with confidence scoring
3. **Validation Node**: Validates extraction quality and provides feedback
4. **Refinement Node**: Refines extraction based on validation feedback
5. **Synthesis Node**: Produces final comprehensive summary
6. **Fallback Node**: Provides simplified extraction if complex workflow fails
### Key Benefits
With LangDB, this multi-stage extraction system gains:
* **End-to-End Tracing**: Complete visibility into processing stages and decision points
* **Confidence Scoring**: Built-in quality assessment for each extraction section
* **Iterative Refinement**: Multiple validation loops with feedback-driven improvements
* **Modular Architecture**: Clean separation of concerns across nodes and tools
* **Robust Error Handling**: Fallback mechanisms ensure reliable processing
* **Centralized Configuration**: All LLM calls routed through LangDB's AI gateway
### Installation
```bash
pip install pylangdb[langchain] langchain langgraph langchain_openai pydantic python-dotenv
```
### Environment Variables
Create a `.env` file in your project root with the following variables:
```bash
# Required for AI platform integration
LANGDB_API_KEY="your_api_key_here"
LANGDB_PROJECT_ID="your_project_id_here"
LANGDB_API_BASE_URL="https://api.us-east-1.langdb.ai"
```
### Project Structure
```
langchain-data-extraction/
├── .env
├── README.md
├── requirements.txt
├── main.py # Main execution script
├── agent.py # Agent construction and workflow
├── models.py # Data models and state definitions
├── nodes.py # Node functions and routing logic
├── tools.py # Tool definitions
└── transcript.py # Sample transcript data
```
## How the Integration Works
### Seamless LangGraph Integration
The key to enhancing LangGraph with LangDB is directing all LLM calls through a centralized AI gateway:
```python
# main.py
from pylangdb.langchain import init
# Initialize LangDB tracing BEFORE importing any LangGraph modules
init()
```
By calling `init()` before any LangGraph imports, the integration:
1. Patches LangGraph's underlying model calling mechanisms
2. Routes all LLM requests through LangDB's API
3. Attaches tracing metadata to each request
4. Captures all node transitions and tool calls
This provides comprehensive observability into complex multi-stage workflows.
### Virtual Model References
Instead of hardcoding model names, we reference LangDB virtual models:
```python
# nodes.py
def create_model():
"""Create and return the ChatOpenAI model with tools bound."""
api_base = os.getenv("LANGDB_API_BASE_URL")
api_key = os.getenv("LANGDB_API_KEY")
project_id = os.getenv("LANGDB_PROJECT_ID")
default_headers = {"x-project-id": project_id}
llm = ChatOpenAI(
model_name='openai/gpt-4o',
temperature=0.2,
openai_api_base=f"{api_base}/v1",
openai_api_key=api_key,
default_headers=default_headers
)
return llm.bind_tools([
analyze_transcript_structure,
extract_with_confidence,
validate_extraction,
refine_extraction
])
```
The `model_name='openai/gpt-4o'` parameter can be replaced with a **LangDB Virtual Model** reference that includes:
* A specific underlying LLM
* Attached tools and MCPs
* Guardrails for input/output validation
* Custom handling and retry logic
This approach offloads complexity from the application code to LangDB AI gateway.
### Modular State Management
The system uses TypedDict for type-safe state management:
```python
# models.py
class ComplexAgentState(TypedDict):
"""Extended state for complex extraction workflow"""
messages: Annotated[Sequence[BaseMessage], add_messages]
transcript: str
complexity_level: ComplexityLevel
extraction_attempts: int
max_attempts: int
confidence_scores: Dict[str, float]
validation_feedback: List[str]
extraction_data: Dict[str, Any]
current_phase: ExtractionPhase
requires_refinement: bool
processing_complete: bool
error_count: int
```
This state structure enables type safety, observability, debugging, and extensibility.
### Advanced Workflow Patterns
The agent implements sophisticated workflow patterns:
```python
# agent.py
def create_complex_agent():
"""Create and return the complex LangGraph agent"""
workflow = StateGraph(ComplexAgentState)
# Add nodes
workflow.add_node("preprocessing", preprocessing_node)
workflow.add_node("initial_extraction", initial_extraction_node)
workflow.add_node("validation", validation_node)
workflow.add_node("refinement", refinement_node)
workflow.add_node("synthesis", synthesis_node)
workflow.add_node("fallback", fallback_node)
workflow.add_node("tool_node", ToolNode([
analyze_transcript_structure,
extract_with_confidence,
validate_extraction,
refine_extraction
]))
# Set entry point
workflow.set_entry_point("preprocessing")
# Add conditional edges for smart routing
workflow.add_conditional_edges(
"preprocessing",
route_after_preprocessing,
{
"tool_node": "tool_node",
"initial_extraction": "initial_extraction"
}
)
return workflow.compile()
```
**Key Benefits:**
* **Conditional Routing**: Smart routing based on validation results
* **Tool Integration**: Seamless tool calls with automatic routing
* **Error Recovery**: Fallback mechanisms for robust processing
* **Observability**: Every decision point is traced in LangDB
## Configuring Virtual Models and Tools
This approach separates tool configuration from code, moving it to a web interface where it can be managed without deployments.
### **Creating Virtual MCP Servers**
Virtual MCP servers act as API gateways to external tools and services:
1. In the LangDB UI, navigate to **Projects → MCP Servers**.
2. Click **+ New Virtual MCP Server** and create the necessary MCPs:
* **Transcript Analysis MCP**: For preprocessing and structure analysis
* **Data Extraction MCP**: For structured information extraction
* **Validation MCP**: For quality assessment and feedback
* **Refinement MCP**: For iterative improvement
### **Attaching MCPs to Virtual Models**
Virtual models connect your agent code to the right tools automatically:
1. Navigate to **Models → + New Virtual Model**.
2. For the Preprocessing Node:
* Name: `transcript_preprocessing`
* Base Model: `openai/gpt-4o`
* Attach the **Transcript Analysis MCP**
* Add guardrails for transcript processing
3. For the Extraction Node:
* Name: `data_extraction`
* Base Model: `openai/gpt-4o`
* Attach the **Data Extraction MCP**
* Add custom response templates for structured output
4. For the Validation Node:
* Name: `extraction_validation`
* Base Model: `openai/gpt-4o`
* Attach the **Validation MCP**
* Add quality assessment rules
### **Key Benefits:**
* **Separation of Concerns**: Code handles workflow orchestration while LangDB handles tools and models
* **Dynamic Updates**: Change tools without redeploying your application
* **Security**: API keys stored securely in LangDB, not in application code
* **Monitoring**: Track usage patterns and error rates in one place
### **Run the Agent**
```bash
python main.py
```
The agent will process the sample transcript and provide detailed output showing each processing phase, confidence scores, and the final synthesized summary.
## Sample Output
Here are key snippets from running the complex data extraction agent:
**Agent Startup:**
```bash
uv run langchain-data-extraction/main.py
=== COMPLEX EXTRACTION AGENT STARTING ===
Transcript length: 7296 characters
Estimated complexity: ComplexityLevel.COMPLEX
```
**Preprocessing Phase:**
```
=== PREPROCESSING PHASE ===
Transcript analysis complete. Complexity: ComplexityLevel.COMPLEX, Participants: 41, Topics: 15
```
**Initial Extraction:**
```
**Meeting Summary: Q1 2024 Mobile App Redesign Project Kickoff**
**Participants and Their Roles:**
- Sarah Chen: Project Manager
- Marcus Rodriguez: Lead Developer
- Dr. Kim Patel: UX Research Director
- Jennifer Wu: Product Owner
- David Thompson: QA Manager
- Lisa Chang: Marketing Director
- Alex Kumar: DevOps Engineer
```
**Validation Feedback:**
```
**Validation Feedback on Meeting Summary Extraction:**
**Completeness of Information Extracted:**
- The summary captures the main points of the meeting, including participant roles, key decisions, action items, conflicts, and resolutions.
- It includes the meeting's purpose, duration, and date, which are essential for context.
```
**Final Comprehensive Summary:**
```
**Comprehensive Summary: Q1 2024 Mobile App Redesign Project Kickoff**
**Meeting Overview:**
- **Date:** January 15, 2024
- **Duration:** 2 hours 30 minutes
- **Purpose:** Initiate the mobile app redesign project aimed at improving user engagement by 40% and reducing bounce rate by 25%.
**Key Decisions:**
1. **Phased Launch Approach:** Soft launch on March 31st to selected users, full release on April 15th.
2. **Technical Architecture:** Adopt Redux Toolkit for state management and React Query for data fetching.
**Action Items:**
1. **CI/CD Pipeline Setup:** Alex Kumar to establish by January 29th (High Priority).
2. **Accessibility Testing Checklist:** Dr. Kim Patel to create by January 22nd (Medium Priority).
3. **Contractor Hiring:** Marcus Rodriguez to hire React Native contractor by January 20th (High Priority).
**Risk Assessment and Mitigation:**
- **Timeline and Migration Complexity:** Concerns over the tight timeline and React Native migration. Mitigation includes hiring a contractor and adjusting the release schedule.
- **Team Capacity:** Addressed by hiring additional resources due to a senior developer's medical leave.
```
This output demonstrates the agent's ability to:
1. **Process Complex Transcripts**: Handle large transcripts (7,296 characters) with multiple participants and topics
2. **Multi-Stage Processing**: Execute preprocessing, extraction, validation, and synthesis phases
3. **Comprehensive Extraction**: Extract detailed information including participants, decisions, action items, conflicts, risks, and follow-up meetings
4. **Structured Output**: Produce well-organized, comprehensive summaries with clear sections
5. **Quality Validation**: Include validation feedback to ensure extraction quality
6. **Detailed Analysis**: Provide insights into project goals, technical decisions, and risk mitigation strategies
The agent successfully transforms a raw meeting transcript into a structured, actionable summary that captures all critical information for project stakeholders.
## Full Tracing with LangDB
The true power of the LangDB integration becomes apparent in the comprehensive tracing capabilities. While basic LangGraph provides conversation logging, LangDB captures every aspect of the complex workflow:
End-to-end tracing in LangDB shows all workflow stages and tool calls
> Checkout: [Data Extraction Agent Thread Using LangGraph](https://app.langdb.ai/sharing/threads/efe57123-9c81-4428-ada1-8d2e55bce4e8)
In the LangDB trace view, you can see:
1. **Node Transitions**: Exact flow between preprocessing → extraction → validation → synthesis
2. **Tool Calls**: Every tool invocation with inputs and outputs
3. **Confidence Scores**: Quality assessment for each extraction section
4. **State Changes**: Complete state evolution throughout the workflow
5. **Performance Metrics**: Token usage and timing for each LLM calls
## Advanced Features
### Confidence Scoring System
The agent implements a sophisticated confidence scoring system:
```python
# tools.py
def extract_with_confidence(...) -> Dict[str, Any]:
"""Extract structured data with confidence scoring"""
# Calculate confidence scores for different sections
confidence_scores = {
"participants": 0.95 if len(participants) > 0 else 0.3,
"decisions": 0.9 if len(decisions) > 0 else 0.5,
"action_items": 0.85 if len(action_items) > 0 else 0.4,
"conflicts": 0.8 if len(conflicts) > 0 else 0.7,
"phases": 0.9 if len(phases) > 0 else 0.6,
"insights": 0.8 if len(key_insights) > 0 else 0.5
}
overall_confidence = sum(confidence_scores.values()) / len(confidence_scores)
return {
"extraction_data": extraction_data,
"confidence_scores": confidence_scores,
"overall_confidence": overall_confidence,
"extraction_complete": overall_confidence > 0.7
}
```
### Conditional Routing Logic
The agent uses sophisticated routing logic:
```python
# nodes.py
def route_after_validation(state: ComplexAgentState) -> str:
"""Route based on validation results"""
if state["extraction_attempts"] >= state["max_attempts"]:
return "synthesis" # Skip to synthesis if too many attempts
# Check for validation results in messages
last_message = state["messages"][-1]
if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
return "tool_node"
# Default routing logic
return "synthesis"
```
The system includes robust fallback mechanisms:
```python
# nodes.py
def fallback_node(state: ComplexAgentState) -> Dict[str, Any]:
"""Fallback to simplified extraction if complex extraction fails"""
print("\n=== FALLBACK PHASE ===")
fallback_prompt = f"""
Perform a simplified extraction from this transcript:
{state['transcript']}
Focus on basic information: participants, main topics, and key outcomes.
Use a simpler structure if the complex extraction failed.
"""
# Simplified processing logic
response = llm.invoke([HumanMessage(content=fallback_prompt)])
return {
"messages": [response],
"processing_complete": True,
"current_phase": ExtractionPhase.FALLBACK
}
```
## Conclusion: Benefits of LangDB Integration
By enhancing LangGraph with LangDB integration, we've achieved several significant improvements:
1. **Comprehensive Observability**: Full tracing of complex multi-stage workflows
2. **Modular Architecture**: Clean separation of concerns across nodes and tools
3. **Quality Assurance**: Built-in confidence scoring and validation loops
4. **Robust Error Handling**: Fallback mechanisms ensure reliable processing
5. **Dynamic Configuration**: Change tools and models without code changes
6. **Performance Monitoring**: Track token usage and timing for optimization
This approach demonstrates how LangDB's AI gateway can enhance LangGraph by providing enhanced tracing, quality control, reliability, and maintainability.
## References
* [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
* [LangDB Virtual MCP Servers](https://docs.langdb.ai/concepts/virtual-mcp-servers)
* [LangDB Virtual Models](https://docs.langdb.ai/concepts/virtual-models)
* [LangChain Data Extraction Sample](https://github.com/langdb/langdb-samples/tree/main/examples/langchain/langchain-data-extraction)
---
## Building Reporting Writing Agent Using CrewAI
Build a powerful multi-agent report generation workflow with CrewAI and LangDB. This guide walks through the full setup: from configuring your agents to sharing a public execution trace.
Checkout: https://app.langdb.ai/sharing/threads/3becbfed-a1be-ae84-ea3c-4942867a3e22
## Code
* LangDB Samples: [https://github.com/langdb/langdb-samples/tree/main/examples/crewai/report-writing-agent](https://github.com/langdb/langdb-samples/tree/main/examples/crewai/report-writing-agent)
## Goal
Create a report-writing AI system where:
* A **Researcher Agent** gathers up-to-date information using web tools like Tavily Search.
* An **Analyst Agent** processes and synthesizes the findings.
* A **Report Writer Agent** generates a clean, markdown-formatted report.
LangDB enables seamless model routing, tracing, and observability across this pipeline, including full visibility into MCP tool calls like Tavily Search used by the Researcher Agent.
### Installation
```bash
pip install crewai 'pylangdb[crewai]' python-dotenv
```
## Project-Structure
```
report-writing-agent/
├── configs
│ ├── agents.yaml
│ └── tasks.yaml
├── main.py
└── README.mdc
```
The code snippets below illustrate the key components of the three-agent reporting workflow.
## Quick Start Example
The code snippets below illustrate the key components of the three-agent reporting workflow.
### Export Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
export LANGDB_API_BASE_URL='https://api.us-east-1.langdb.ai'
```
### Initialize Tracing
The first and most important step is to initialize `pylangdb` tracing **before** any other CrewAI code runs. This call instruments the environment to automatically capture all agent and tool activity.
```python
from pylangdb.crewai import init
from dotenv import load_dotenv
# Load environment variables and initialize tracing
load_dotenv()
init()
```
## Configure the LLM
First, define a helper function to instantiate LLMs with the necessary LangDB tracing headers. This ensures all model calls are captured.
```python
from crewai import LLM
import os
def create_llm(model):
return LLM(
model=model,
api_key=os.environ.get("LANGDB_API_KEY"),
base_url=os.environ.get("LANGDB_BASE_URL"),
extra_headers={
"x-project-id": os.environ.get("LANGDB_PROJECT_ID")
}
)
```
## Define the Crew
Next, use the `@CrewBase` decorator to define the agents and tasks, loading configurations from external YAML files.
```python
from crewai import Agent, Task, Crew, Process
from crewai.project import CrewBase, agent, crew, task
@CrewBase
class ReportGenerationCrew():
"""Report Generation crew"""
agents_config = "configs/agents.yaml"
tasks_config = "configs/tasks.yaml"
@agent
def researcher(self) -> Agent:
return Agent(
config=self.agents_config['researcher'],
llm=create_llm("openai/langdb/reportresearcher_9wzgx5n5" ) # LangDB Virtual Model with MCP
)
@agent
def analyst(self) -> Agent:
return Agent(
config=self.agents_config['analyst'],
llm=create_llm("openai/anthropic/claude-3.7-sonnet")
)
# ... report_writer agent definition follows the same pattern ...
@crew
def crew(self) -> Crew:
return Crew(
agents=[self.researcher(), self.analyst(), self.report_writer()],
tasks=[self.research_task(), self.analysis_task(), self.report_writing_task()],
process=Process.sequential
)
```
## Run the Workflow
Finally, instantiate the crew and `kickoff()` the process with a specific topic.
```python
def generate_report(topic):
crew_instance = ReportGenerationCrew()
# ... update task descriptions with the topic ...
result = crew_instance.crew().kickoff()
return result
if __name__ == "__main__":
generate_report("The Impact of AI on Social Media Marketing in 2024")
```
Running the script will generate a full trace in LangDB that includes every model call, tool invocation, and inter-agent message.
### Configs
#### agents.yaml
```yaml
researcher:
role: "Domain Researcher"
goal: "Gather up-to-date information on the given topic"
backstory: "Experienced researcher with expertise in finding credible sources."
```
#### tasks.yaml
```yaml
research_task:
description: >
Research the given topic thoroughly using web search tools.
Gather current information, statistics, trends, and key insights.
Focus on finding credible sources and recent developments.
expected_output: >
A comprehensive research summary with key findings, statistics,
and relevant information about the topic. Include source citations.
```
Full source code—including additional agents, tasks, and YAML configs—is available in the repository: [https://github.com/langdb/langdb-samples/tree/main/examples/crewai/report-writing-agent](https://github.com/langdb/langdb-samples/tree/main/examples/crewai/report-writing-agent).
## Configuring MCP and Models
To enable the Researcher Agent to retrieve fresh, real‑time information and ensure every search query is recorded for auditing and debugging, we configure a Virtual MCP Server and attach it to a Virtual Model. This setup provides:
* **Live Web Search**: Integrate external search capabilities directly into your agent.
* **Traceability**: All MCP tool calls (search queries, parameters, responses) are logged in LangDB for observability and troubleshooting.
* **Consistency**: Using a dedicated MCP Server ensures uniform search behavior across runs.
### Steps To Create a Virtual MCP
1. In LangDB UI, navigate to **Projects → MCP Servers**.
2. Click **+ New Virtual MCP Server**:
* **Name**: `web-search-mcp`
* **Underlying MCP**: Tavily Search MCP
* **Requires API Key**: Make sure Tavily API Key is configured in your environment to authenticate this operation.
3. Navigate to **Models → + New Virtual Model**:
* **Name**: `report-researcher`
* **Base Model**: GPT-4.1 or similar
* **Attach**: `web-search-mcp` as the search tool
4. Copy the model identifier (e.g. `openai/langdb/report-researcher`) and use it in the Researcher agent.
LangDB will log all MCP calls for traceability.
## Custom Model Usage
You can use any model available on LangDB. When specifying a model, ensure it follows the LiteLLM naming convention for provider-specific models (e.g., `openai/gpt-4o`, `anthropic/claude-3-sonnet-20240229`). To customize, simply update the `create_llm()` calls with your preferred model identifiers:
```python
tool_llm = create_llm("openai/langdb/report-researcher")
analysis_llm = create_llm("openai/gpt-4o")
writer_llm = create_llm("openai/google/gemini-2.5-pro")
```
Ensure the model string matches a valid LangDB or OpenAI namespace. All routing, tracing, and MCP integrations remain identical regardless of the model.
> When you create a new Virtual Model in LangDB, it will generate a unique model name (for example, `openai/langdb/report-researcher@v1`).\
> **Be sure to replace** the example model name in your `main.py` and in your agent config files with the actual model name generated for your project.
## Running the Agent
Execute the workflow by passing a topic:
```
python main.py "The Impact of AI on Social Media Marketing in 2024"
```
The CLI will prompt for a topic if none is provided.
## Conclusion
Below is a real, shareable example of a generated report and full execution trace using this pipeline:
> Checkout: [Report Writing Agent Thread using CrewAI](https://app.langdb.ai/sharing/threads/3becbfed-a1be-ae84-ea3c-4942867a3e22)
### References
* [LangDB Virtual MCP Servers](https://app.gitbook.com/s/0peUeEHQMLSxYq8BpQGt/concepts/virtual-mcp-servers)
* [LangDB Virtual Models](https://app.gitbook.com/s/0peUeEHQMLSxYq8BpQGt/concepts/virtual-models)
* [CrewAI Documentation](https://docs.crewai.com/)
---
## Building Travel Agent with OpenAI Agents SDK
Check Out: https://app.langdb.ai/sharing/threads/43cfa16f-042e-44ca-ad21-06f52afeca39
## Code
* LangDB Samples: [https://github.com/langdb/langdb-samples/tree/main/examples/openai/travel-agent](https://github.com/langdb/langdb-samples/tree/main/examples/openai/travel-agent)
This guide illustrates how to build a multi-agent travel query workflow using the OpenAI Agents SDK, augmented by LangDB for guardrails, virtual MCP servers (tool integration), and model routing.
OpenAI introduced the **Agents SDK**, a lightweight, Python-first toolkit for building agentic AI apps. It’s built around three primitives:
* **Agents**: LLMs paired with tools and instructions to complete tasks autonomously.
* **Handoffs**: Let agents delegate tasks to other agents.
* **Guardrails**: Validate inputs/outputs to keep workflows safe and reliable.
## Overview
This guide illustrates how to build a multi-agent travel query workflow using the **OpenAI Agents SDK**, augmented by LangDB for advanced tracing, tool integration, and model routing.
We will create a 4-agent pipeline:
1. **Query Router Agent**: Routes user queries to the appropriate specialist agent.
2. **Booking Specialist**: Manages booking-related requests.
3. **Travel Recommendation Specialist**: Provides destination recommendations with web search support.
4. **Reply Agent**: Formats the final output for the user.
## Installation
```bash
pip install openai-agent 'pylangdb[openai]' python-dotenv
```
## Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
export LANGDB_API_BASE_URL="https://api.us-east-1.langdb.ai"
```
## Code Walkthrough
The snippets below break down how to configure the OpenAI Agents SDK with LangDB for end-to-end tracing and custom model routing.
### Initialize LangDB Tracing
First, initialize `pylangdb` tracing. This must be the first step to ensure all subsequent SDK operations are captured.
```python
import os
import asyncio
import uuid
from dotenv import load_dotenv
from pylangdb.openai import init
# Load environment variables and initialize tracing
load_dotenv()
init()
```
### Configure the OpenAI Client & Model Provider
Next, configure the `AsyncOpenAI` client to send all requests through the LangDB gateway. We then create a `CustomModelProvider` to ensure the Agents SDK uses this client for all model calls.
```python
from agents import (
Agent, Runner, set_default_openai_client, RunConfig,
ModelProvider, Model, OpenAIChatCompletionsModel
)
from openai import AsyncOpenAI
# Configure the client with LangDB headers
client = AsyncOpenAI(
api_key=os.environ["LANGDB_API_KEY"],
base_url=os.environ["LANGDB_API_BASE_URL"],
default_headers={"x-project-id": os.environ["LANGDB_PROJECT_ID"]}
)
# Set the configured client as default for tracing
set_default_openai_client(client, use_for_tracing=True)
# Create a custom model provider to route all model calls through LangDB
def get_model(model_name):
return OpenAIChatCompletionsModel(model=model_name, openai_client=client)
```
### Define the Agents
Now, define the specialist agents and the router agent that orchestrates them. The `model` parameter can be any model available in LangDB, including the virtual models we configure in the next section.
```python
# Define specialist agents
booking_agent = Agent(
name="Booking Specialist",
instructions="You are a booking specialist. You help customers with their booking and reservation questions.",
model=get_model("openai/gpt-4o-mini")
)
travel_recommendation_agent = Agent(
name="Travel Recommendation Specialist",
instructions="You are a travel recommendation specialist. You help customers find ideal destinations and travel plans.",
model=get_model("langdb/recc_8ac7wclb") # A virtual model with search tools attached
)
reply_agent = Agent(
name="Reply Agent",
instructions="You reply to the user's query and make it more informal by adding emojis.",
model=get_model("langdb/reply_idzqgtrm") # A virtual model for formatting
)
# Define the orchestrator agent
query_router_agent = Agent(
name="Query Router",
instructions="You determine which specialist to use based on the user's query, then hand off to the reply agent.",
model=get_model("langdb/router_c77w2sji"), # A virtual model for routing
handoffs=[reply_agent],
tools=[
booking_agent.as_tool(
tool_name="booking_tool",
tool_description="Use for questions about flight bookings or reservations.",
),
travel_recommendation_agent.as_tool(
tool_name="travel_tool",
tool_description="Use for travel destination recommendations or planning.",
)
]
)
```
### Run the Workflow
Finally, use the `Runner` to execute the workflow. We inject our `CustomModelProvider` and a `group_id` into the `RunConfig` to ensure all steps are routed through LangDB and linked in the same trace.
```python
async def run_travel_agent(query: str):
group_id = str(uuid.uuid4()) # Links all steps in this session
response = await Runner.run(
query_router_agent,
input=query,
run_config=RunConfig(
model_provider=CUSTOM_MODEL_PROVIDER,
group_id=group_id
)
)
print(response.final_output)
if __name__ == "__main__":
asyncio.run(run_travel_agent("I want to book a flight to Paris."))
```
### Configuring MCPs, Guardrails, and Models
To empower agents with tools like web search or to enforce specific behaviors with guardrails, you use LangDB Virtual Models. This allows you to attach functionality directly to a model identifier without changing your agent code.
1. In the LangDB UI, navigate to **Models → + New Virtual Model**.
2. Create virtual models for your agents (e.g., `travel-recommender`, `query-router`).
3. Attach tools and guardrails as needed:
* **For the `travel_recommendation_agent`**: Attach an MCP Server (like Tavily Search) to give it live web search capabilities.
* **For the `query_router_agent`**: Attach guardrails to validate incoming requests. For example:
* **Topic Adherence**: Ensure the query is travel-related.
* **OpenAI Moderation**: Block harmful or disallowed content.
* **Minimum Word Count**: Reject overly short or vague queries.
* **For the `reply_agent`**: Attach a **Language Validator** guardrail to ensure the final output is in the expected language.
4. Use the virtual model's identifier (e.g., `langdb/travel-recommender`) as the `model` string in your `Agent` definition.
## Full Trace
After setting up the virtual models and running the query like:
```bash
uv run app.py 'Im planning a trip to Japan in April. What are the must-see cherry blossom spots?'
```
We get the following trace
You can check out the entire trace here:
> Checkout: [Travel Agent Thread using OpenAI Agents SDK](https://app.langdb.ai/sharing/threads/43cfa16f-042e-44ca-ad21-06f52afeca39)
## References
* [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/)
* [LangDB Virtual MCP Servers](https://docs.langdb.ai/concepts/virtual-mcp-servers)
* [LangDB Guardrails](https://docs.langdb.ai/features/guardrails)
* [LangDB Virtual Models](https://docs.langdb.ai/concepts/virtual-models)
---
## Building Travel Concierge with Google-ADK
This guide demonstrates how to enhance [Google's ADK Travel Concierge](https://github.com/google/adk-samples/tree/main/python/agents/travel-concierge) sample by integrating it with LangDB. By leveraging LangDB's AI gateway, you can offload complex tool logic to virtual MCPs, dynamically select models, and gain full end-to-end observability into your agent's operations. This approach simplifies the agent's code, improves security by centralizing API key management, and provides powerful tracing capabilities.
Checkout: https://app.langdb.ai/sharing/threads/8425e068-77de-4f41-8aa9-d1111fc7d2b7
## Code
* LangDB Samples: [https://github.com/langdb/langdb-samples/tree/main/examples/google-adk/travel-concierge](https://github.com/langdb/langdb-samples/tree/main/examples/google-adk/travel-concierge)
## Overview
The Travel Concierge is a sophisticated agent system built with Google ADK that handles the complete travel lifecycle through specialized sub-agents.
### Travel Concierge Architecture
The system consists of these specialized sub-agents:
1. **Main Agent**: Orchestrates the workflow and routes user queries to specialized sub-agents.
2. **Inspiration Agent**: Helps users discover travel destinations based on preferences.
3. **Planning Agent**: Creates detailed itineraries and travel plans.
4. **Booking Agent**: Assists with booking flights, hotels, and activities.
5. **Pre-Trip Agent**: Provides preparation guidance before departure.
6. **In-Trip Agent**: Offers real-time assistance during travel.
7. **Post-Trip Agent**: Collects feedback and helps with post-travel tasks.
### Key Benefits
With LangDB, this multi-agent system gains significant capabilities:
* **Seamless Integration**: All LLM calls flow through a centralized AI gateway
* **Tool Abstraction**: Virtual MCPs simplify integration with external services
* **End-to-End Tracing**: Complete visibility into agent interactions and tool usage
* **Flexible Model Selection**: Choose any model without code changes
* **Centralized Configuration**: Tools, guardrails, and monitoring managed in one place
### Installation
```bash
pip install google-adk "pylangdb[adk]" python-dotenv
```
### Environment Variables
Create a `.env` file in your project root with the following variables:
```bash
# Required for AI platform integration
LANGDB_API_KEY="your_api_key_here"
LANGDB_PROJECT_ID="your_project_id_here"
```
### Project Structure
```
travel-concierge/
├── .env
├── README.md
└── travel_concierge/
├── agent.py # Main orchestrator agent
├── __init__.py
├── prompt.py
├── profiles/ # User profile templates
├── shared_libraries/ # Common utilities
├── sub_agents/ # Specialized travel agents
│ ├── booking/
│ ├── inspiration/
│ ├── in_trip/
│ ├── planning/
│ ├── post_trip/
│ └── pre_trip/
└── tools/ # Shared tools
```
## How the Integration Works
### Seamless LLM Integration
The key to enhancing Google's ADK sample is directing all LLM calls through a centralized AI gateway. This is achieved through the Python SDK:
```python
# travel_concierge/__init__.py
from pylangdb.adk import init
# Initialize LangDB tracing BEFORE importing any Google ADK modules
# This sends all LLM API calls through LangDB's gateway
init()
```
By calling `init()` before any Google ADK imports, the integration:
1. Patches the ADK's underlying model calling mechanisms
2. Routes all LLM requests through LangDB's API
3. Attaches tracing metadata to each request
4. Captures all agent-to-agent handoffs and tool calls
This is a significant architectural improvement over the original sample, which requires manual instrumentation to achieve observability.
### Virtual Model References
Instead of hardcoding model names or building complex tool integration, we reference LangDB virtual models in the agent definitions:
```python
# travel_concierge/agent.py
from google.adk.agents import Agent
from .prompt import MAIN_AGENT_PROMPT
from .sub_agents.inspiration import inspiration_agent
from .sub_agents.planning import planning_agent
from .sub_agents.booking import booking_agent
from .sub_agents.pre_trip import pre_trip_agent
from .sub_agents.in_trip import in_trip_agent
from .sub_agents.post_trip import post_trip_agent
# Main orchestrator agent
travel_concierge = Agent(
model="gemini/gemini-2.5-pro-preview",
name="travel_concierge",
description="A comprehensive travel assistant that helps with all phases of travel planning and execution.",
instruction=MAIN_AGENT_PROMPT,
sub_agents=[
inspiration_agent,
planning_agent,
booking_agent,
pre_trip_agent,
in_trip_agent,
post_trip_agent
]
)
# This is the entry point for the ADK
root_agent = travel_concierge
```
Notice the `model="langdb/travel_concierge_main"` parameter. This doesn't point directly to a specific LLM but to a **LangDB Virtual Model** that can have:
* A specific underlying LLM (like Google's Gemini Pro)
* Attached tools and MCPs
* Guardrails for input/output validation
* Custom handling and retry logic
This approach offloads significant complexity from the application code to LangDB AI gateway.
### Simplified Sub-Agent Implementation
The `inspiration_agent` uses a virtual model that connects to a Google Maps MCP. The agent code itself only needs to reference the tools, not implement them:
```python
# travel_concierge/sub_agents/inspiration/agent.py
from google.adk.agents import Agent, AgentTool
from . import prompt
inspiration_agent = Agent(
model= "langdb/inspiration_agent_z73m3wmd", # Virtual Model
name="inspiration_agent",
description="A travel inspiration agent who inspire users, and discover their next vacations; Provide information about places, activities, interests,",
instruction=prompt.INSPIRATION_AGENT_INSTR,
# Include map_tool so the agent can verify POIs on Google Maps after invoking poi_agent
tools=[
AgentTool(agent=place_agent),
AgentTool(agent=poi_agent),
],
)
```
**Key Benefit:** The agent code is significantly cleaner. All tool configuration is offloaded to the central platform, allowing you to:
1. Change search providers without code changes
2. Add guardrails for travel queries
3. Deploy updates to tools without redeploying your application
### Tool Definitions vs. Virtual MCPs
**Original Approach (Tool Implementation)**
In the original Google ADK sample, you would need to implement a complex tool to interact with the Google Places API directly:
```python
# travel_concierge/tools/map.py (Original Approach Snippet)
import os
import requests
from google.adk.tools import ToolContext
class PlacesService:
"""Wrapper to Places API."""
def find_place_from_text(self, query: str):
"""Fetches place details using a text query."""
# ... complex request and parsing logic ...
# The tool function itself
def map_tool(key: str, tool_context: ToolContext):
# ... logic to inspect state and call PlacesService ...
pois = tool_context.state[key]["places"]
for poi in pois:
location = poi["place_name"] + ", " + poi["address"]
result = places_service.find_place_from_text(location)
# ... update state with results ...
return {"places": pois}
# Initialize with API key from environment
search_tool = SearchTool(os.environ.get("SEARCH_API_KEY"))
```
This approach requires:
* Managing API keys in your environment
* Implementing error handling and retries
* Deploying code changes for API updates
* Writing custom integration code for each tool
**Enhanced Approach (Virtual MCPs)**
With the Virtual MCP approach, you offload the entire tool implementation to the central platform. This means:
1. No tool implementation code needed in your application.
2. API keys are managed securely in the platform.
3. You get built-in error handling and retries.
4. Tools can be updated or switched without code changes.
## Configuring Virtual Models and MCPs
This approach separates tool configuration from code, moving it to a web interface where it can be managed and updated without deployments.
### **Creating Virtual MCP Servers**
Virtual MCP servers act as API gateways to external tools and services:
1. In the LangDB UI, navigate to **Projects → MCP Servers**.
2. Click **+ New Virtual MCP Server** and create the necessary MCPs. Each MCP is then connected to a virtual model, which is used by a specific agent:
* **Inspiration Agent** uses a virtual model connected to a **Google Maps** MCP.
* **Planning Agent** uses a virtual model connected to an **Airbnb** MCP.
* **Search Agent** uses a virtual model connected to a **Tavily Search** MCP.
### **Attaching MCPs to Virtual Models**
Virtual models connect your agent code to the right tools automatically:
1. Navigate to **Models → + New Virtual Model**.
2. For the Inspiration Agent:
* Name: `travel_inspiration_agent`
* Base Model: `anthropic/claude-sonnet-4`
* Attach the **Tavily Search MCP** from step 1
* Add optional guardrails (e.g., content filters for travel advice)
3. For the Google Search agent in shared tools:
* Name: `google_search_agent`
* Base Model: `gemini/gemini-2.5-pro-preview`
* Attach the **Google Places MCP** from step 1
* Add custom response templates for place recommendations
**Key Benefits:**
* **Separation of Concerns**: Code handles agent orchestration while LangDB handles tools and models
* **Dynamic Updates**: Change tools without redeploying your application
* **Security**: API keys stored securely in LangDB, not in application code
* **Monitoring**: Track usage patterns and error rates in one place
### **Run the Agent**
```bash
adk web
```
Navigate to `http://localhost:8000`, select `travel_concierge`, and start your travel planning conversation.
### Example Conversation with Traced Tool Calls
Here is an example conversation that shows how the `inspiration_agent` uses a virtual model connected to a Google Maps MCP to provide travel suggestions.
**User Input:**
```
Inspire me about Maldives
```
**Agent Execution Flow:**
1. The root `travel_concierge` agent receives the request and transfers control to the `inspiration_agent`.
* **Tool Call**: `transfer_to_agent(agent_name="inspiration_agent")`
2. The `inspiration_agent` is activated. Its virtual model is configured to use the Google Maps MCP.
* **Tool Call**: `poi_agent(request="Maldives")`
3. The Google Maps MCP is invoked, which searches for points of interest in the Maldives and returns a structured list of places with details, ratings, and map links.
**Final Agent Response:**
```
Of course, the Maldives is a stunning choice! Here are a few points of interest to get you started on your dream vacation:
1. North Malé Atoll
Highlights: A stunning coral island known for its powdery white sand, turquoise waters, vibrant marine life, and luxury resorts. Perfect for snorkeling, diving, and relaxation.
Rating: 4.7
Map: https://www.google.com/maps/search/?api=1&query=4.416667,73.5&query_place_id=ChIJs_UZm7KJPzsRyXC6lVstCGY
2. Ari Atoll
Highlights: Renowned for its crystal-clear waters and diverse marine life, including whale sharks, manta rays, and colorful reefs, making it a diver’s paradise.
Rating: 4.4
Map: https://www.google.com/maps/search/?api=1&query=3.8833333,72.8333333&query_place_id=ChIJc2fMBIzdQDsRml476HWWT5g
3. Malé City
Highlights: The bustling capital city featuring the 17th-century Hukuru Miskiy mosque, Maldives Islamic Centre, vibrant markets, and local eateries. Great for experiencing Maldivian culture.
Rating: 4.3
Map: https://www.google.com/maps/search/?api=1&query=4.1752709,73.5091172&query_place_id=ChIJVUvxlVV-PzsRV7p9OeoztoE
Would you like to dive deeper into any of these destinations? Or, if you're feeling inspired and ready to start planning, I can hand you over to our planning_agent to help with itineraries, flights, and hotels.
```
## Full Tracing with LangDB
The true power of the LangDB integration becomes apparent in the comprehensive tracing capabilities. While the original Google ADK sample provides basic conversation logging, LangDB captures every aspect of the agent system's operation:
End-to-end tracing in LangDB shows all agent interactions and tool calls
You can checkout the entire conversation here:
> Checkout: [Travel Concierge Thread using Google ADK](https://app.langdb.ai/sharing/threads/8425e068-77de-4f41-8aa9-d1111fc7d2b7)
In the LangDB trace view, you can see:
1. The exact flow paths between agents
2. Every tool call made through virtual MCPs
3. Complete inputs and outputs of each model invocation
4. Performance metrics and token usage
5. Errors and retries (if any occurred)
## Conclusion: Benefits of LangDB Integration
By enhancing Google's ADK Travel Concierge sample with LangDB integration, we've achieved several significant improvements:
1. **Architectural Simplification**: Cleaner code with less boilerplate by offloading tool logic to LangDB
2. **Dynamic Tool Configuration**: Change tools and models without code changes
3. **Enhanced Security**: API keys managed centrally in LangDB
4. **Complete Observability**: End-to-end tracing of all agent interactions
5. **Improved Development Workflow**: Update agent capabilities without redeployment
This approach demonstrates how LangDB's AI gateway can enhance agent frameworks by providing a layer of abstraction that simplifies development while adding powerful capabilities.
### References
* [Google ADK Documentation](https://google.github.io/adk-docs/)
* [LangDB Virtual MCP Servers](https://docs.langdb.ai/concepts/virtual-mcp-servers)
* [LangDB Virtual Models](https://docs.langdb.ai/concepts/virtual-models)
* [End-to-End Tracing on Google ADK with LangDB](https://blog.langdb.ai/discover-end-to-end-tracing-on-google-adk-with-langdb)
* [Google ADK Travel Concierge Sample](https://github.com/google/adk-samples/tree/main/python/agents/travel-concierge)
---
## Building Web Search Agent with Google-ADK
This guide walks you through setting up a powerful WebSearch agent that leverages Google ADK for orchestration and LangDB for LLM access, tracing, and flexible routing.
Checkout: https://app.langdb.ai/sharing/threads/b6ddc154-33a0-403f-948c-9a559d93445a
## Code
* LangDB Samples: [https://github.com/langdb/langdb-samples/tree/main/examples/google-adk/web-search-agent](https://github.com/langdb/langdb-samples/tree/main/examples/google-adk/web-search-agent)
## Overview
The final agent will use a `SequentialAgent` to orchestrate two sub-agents:
1. **Critic Agent**: Receives a user's query, searches the web for information using a tool, and provides an initial analysis with source references.
2. **Reviser Agent**: Takes the critic's output, refines the content, and synthesizes a final, polished answer.
LangDB automatically captures the entire workflow, giving you full visibility into the handoff between agents and the tools they use.
## Installation
```bash
pip install google-adk "pylangdb[adk]" python-dotenv
```
### Environment Variables
```bash
export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
```
### Project Structure
```
web-search-agent/
├── web-search/
│ ├── agent.py # Root SequentialAgent
│ ├── __init__.py
│ └── sub_agents/
│ ├── __init__.py
│ ├── critic/
│ │ ├── agent.py # Uses Virtual Model
│ │ ├── __init__.py
│ │ └── prompt.py # Contains the critic's instruction prompt
│ └── reviser/
│ ├── agent.py
│ ├── __init__.py
│ └── prompt.py # Contains the reviser's instruction prompt
└── pyproject.toml
```
## Code Walkthrough
### Initialize LangDB Tracing
The most important step is to call `pylangdb.adk.init()` **before** any Google ADK modules are imported. This instruments the environment for automatic tracing.
```python
# web-search/agent.py
from pylangdb.adk import init
init()
from google.adk.agents import SequentialAgent
from .sub_agents.critic import critic_agent
from .sub_agents.reviser import reviser_agent
# ...
```
### Define the Critic Agent with a Virtual Model
The `critic_agent` is responsible for the web search. Instead of hard-coding a tool, we assign it a **LangDB Virtual Model**. This virtual model has a Tavily Search MCP attached, giving the agent search capabilities without changing its code.
```python
# web-search/sub_agents/critic/agent.py
from google.adk.agents import Agent
# ... other imports for the callback ...
critic_agent = Agent(
# This virtual model has a Tavily Search MCP attached in the LangDB UI
model="langdb/critic_agent_37r6urik",
name="critic_agent",
instruction=prompt.CRITIC_PROMPT,
after_model_callback=_render_reference # Formats search results
)
```
### Define the Sequential Agent
The root `agent.py` defines a `SequentialAgent` that orchestrates the workflow, passing the user's query first to the `critic_agent` and then its output to the `reviser_agent`.
```python
# web-search/agent.py
# ... imports and init() call ...
llm_auditor = SequentialAgent(
name='web_search_agent',
description=(
'A 2-step web search agent that first searches and analyzes web content,'
' then refines and synthesizes the information to provide comprehensive'
' answers to user queries.'
),
sub_agents=[critic_agent, reviser_agent],
)
# This is the entry point for the ADK
root_agent = llm_auditor
```
### Configuring MCPs and Virtual Models
To empower the `critic_agent` with live web search, we first create a **Virtual MCP Server** for the search tool and then attach it to a **Virtual Model**.
#### 1. Create a Virtual MCP Server
First, create a dedicated MCP server for the search tool.
1. In the LangDB UI, navigate to **Projects → MCP Servers**.
2. Click **+ New Virtual MCP Server** and configure it:
* **Name**: `web-search-mcp`
* **Underlying MCP**: Select **Tavily Search**.
* **Note**: The Tavily MCP requires an API key. Ensure you have added your `TAVILY_API_KEY` to your LangDB account secrets for the tool to function.
#### 2. Create and Configure the Virtual Model
Next, create a virtual model and attach the MCP you just made.
1. Navigate to **Models → + New Virtual Model**.
2. Give it a name (e.g., `critic-agent`).
3. In the **Tools** section, click **+ Attach MCP Server** and select the `web-search-mcp` you created.
4. Save the model and copy its identifier (e.g., `langdb/critic-agent_xxxxxx`).
5. Use this identifier as the `model` in your `critic_agent` definition.
### Running the Agent
With your `pyproject.toml` configured for the ADK, you can run the agent locally:
```bash
adk web
```
Navigate to `http://localhost:8000`, select `web-search`, and enter a query.
### Full Trace
Every run is captured in LangDB, showing the full sequence from the initial query to the final revised answer, including the tool calls made by the critic agent.
You can checkout the entire conversation history as well as tracing from the thread:
> Checkout: [Google ADK Agent Thread](https://app.langdb.ai/sharing/threads/b6ddc154-33a0-403f-948c-9a559d93445a)
## References
* [Google ADK Documentation](https://google.github.io/adk-docs/)
* [LangDB Virtual MCP Servers](https://docs.langdb.ai/concepts/virtual-mcp-servers)
* [LangDB Virtual Models](https://docs.langdb.ai/concepts/virtual-models)
* [Tavily Search MCP](https://app.langdb.ai/mcp-servers/tavily-mcp-4024f9c3-3d20-48d2-92da-4c7e9910e5f9)
---
## Bring Gemini, Claude, DeepSeek to Agents SDK
OpenAI recently introduced the **Agents SDK**, a lightweight, Python-first toolkit for building agentic AI apps. It’s built around three primitives:
* **Agents**: LLMs paired with tools and instructions to complete tasks autonomously.
* **Handoffs**: Let agents delegate tasks to other agents.
* **Guardrails**: Validate inputs/outputs to keep workflows safe and reliable.
## **TLDR;**
OpenAI’s Agents SDK is great for building tool-using agents with handoffs. But what if you want to use Gemini's latest reasoning model or take advantage of DeepSeek’s 90 percent lower cost?
With LangDB, you can run the same agent logic across more than 350 models, including Claude, Gemini, DeepSeek, and Grok, without changing your code. It works out of the box with frameworks like CrewAI and LangChain, and provides full trace visibility into every agent, model, and tool call.
## Building a Multi-Agent Customer Support System
Let’s create a production-grade AI agent system using the **OpenAI Agents SDK**, with **LangDB** providing multi-model flexibility and runtime control.
We’ll build four specialized agents, each powered by a different model to demonstrate LangDB's multi-model capabilities:
* **OrderAgent** (claude-3.7-sonnet): checks order status
* **FAQAgent** (gemini-2.5-pro): answers common customer questions
* **ComplaintAgent** (grok-2): handles complaints with empathy
* **ReservationAgent** (gpt-4o): manages table bookings
Then, we use a **Classifier Agent** to route user queries to the appropriate sub-agent using built-in handoff.
### Overview
Each leaf agent is powered by a tool — a simple Python function decorated with `@function_tool`.
### **Example: Tool for Order Status**
```python
@function_tool
def check_order_status(order_id: str):
order_statuses = {
"12345": "Being prepared.",
"67890": "Dispatched.",
"11121": "Still processing."
}
return order_statuses.get(order_id, "Order ID not found.")
```
### Defining the agent:
```python
order_agent = Agent(
name="OrderAgent",
model="anthropic/claude-3.7-sonnet",
instructions="Help customers with their order status.",
tools=[check_order_status]
)
```
Each of the other agents follows the same structure, varying only in tools, instructions, and model selection.
### Routing User Queries with Handoffs
```python
classifier_agent = Agent(
name="User Interface Agent",
model="openai/gpt-4o-mini",
instructions="You are a restaurant customer support agent. Handoff to the appropriate agent based on the user query.",
handoffs=[order_agent,faq_agent,complaint_agent,reservation_agent]
)
```
This agent functions as a controller, deciding which specialized agent should handle the user's request.
## Running Multi-Model Agents with LangDB
To run the agents through LangDB and switch between different providers, first configure the OpenAI-compatible client like this:
```python
from agents import Agent, set_default_openai_client
from openai import AsyncOpenAI
from uuid import uuid4
client = AsyncOpenAI(
api_key="langdbAPIKey",
base_url=f"https://api.us-east-1.langdb.ai/{langDBProjectID}/v1",
default_headers={"x-thread-id": str(uuid4()), "x-run-id": str(uuid4())})
set_default_openai_client(client, use_for_tracing=False)
```
LangDB allows you to test your agents using models such as **GPT**, **Claude**, **Gemini**, or **Grok**, while keeping the agent logic unchanged.
To switch models, simply change the configuration:
```python
faq_agent = Agent(
name="FAQAgent",
model="gemini/gemini-2.5-pro-exp-03-25", # or claude-3.7-sonnet, gpt-4o,
instructions="Answer common customer questions about hours, menu, and location."
...)
```
This approach enables you to evaluate multiple providers and optimize for quality, cost, or latency.
## Observing the Agents Execution
LangDB provides a unified trace view that helps developers inspect agent flows across different models. Each span in the trace shows the model used, any tools called, and the time taken at each step.
Each row in the timeline corresponds to a different model and agent interaction.
In the example above, you can see:
* Multiple agents running on different models in a single request
* Visual breakdown of agent handoffs and tool calls
* Timeline and cost details for each model involved
This trace graph shows a single user query routed through multiple agents, each using a different model. It captures the classifier decision, tool calls, and model usage in one place, making it easy to inspect and debug the full interaction flow.
The visibility helps in debugging behavior, verifying tool usage, and understanding model performance across complex workflows.
## Why this matters?
In real-world applications, different models can excel depending on the type of task or interaction style required:
* Some are better suited for fast, low-latency tasks
* Others handle nuanced, empathetic, or creative responses well
* Certain models are optimized for summarization or structured formatting
* Others provide strong performance for general-purpose conversations
LangDB lets you assign the most suitable model to each agent, giving you task-specific control while maintaining a unified development experience.
## Conclusion
The OpenAI Agents SDK provides a clean way to define agent workflows. Paired with LangDB, it becomes possible to run the same agent setup across multiple model providers without changing your application code.
LangDB gives you visibility into agent execution through trace views and lets you switch between over 350 supported models using a consistent interface. This makes it easier to compare performance, debug behavior, and adapt to evolving requirements.
## Try It Yourself
To explore this setup, check out the sample project in the repository. It contains the necessary code to run agents locally, modify model configurations, and observe how requests are routed across different models using LangDB.
Visit [LangDB](https://langdb.ai/) and signup to get 10$ credit.
View the sample code: [GitHub Repository](https://github.com/langdb/langdb-samples/tree/main/examples/openai/customer-support).
***
### References
* [OpenAI Agents SDK Documentation](https://openai.github.io/openai-agents-python/)
* [LangDB Documentation](https://docs.langdb.ai/)
* [LangDB Model List](https://app.langdb.ai/models)
* [LangDB Samples Repository](https://github.com/langdb/langdb-samples)
---
## Configure Fallback Routing with LangDB
Ensure your AI applications stay online even during traffic spikes or model outages by configuring **Fallback Routing**. This guide walks you through setting up fallback routers using LangDB's routing feature.
## What is Fallback Routing?
Fallback Routing allows LangDB to automatically switch to a backup model when your preferred model is slow, down, or overloaded. This helps you:
* Avoid downtime
* Improve reliability
* Scale applications without manual intervention
## Example: Basic Fallback Routing
Let’s say you want to use **DeepSeek-Reasoner**, but switch to **GPT-4o** if it becomes unavailable.
Here’s how you can use the UI to set it up:
Here’s how you can set it up programmatically:
```json
{
"model": "router/dynamic",
"router": {
"name": "fallback-router",
"type": "fallback",
"targets": [
{ "model": "deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
{ "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
]
}
}
```
### Behavior
* First, it tries **deepseek-reasoner**
* If that fails, it automatically falls back to **GPT-4o**
## Example: Fallback with Load-Balancing
In the previous example, we implemented a simple fallback mechanism. However, a more robust solution would be to **distribute queries across multiple providers of DeepSeek-R1** while maintaining a fallback to GPT-4o if both providers fail. This method helps balance traffic efficiently while ensuring uninterrupted AI services.
Here’s how you can configure **Fallback Routing with Percentage-Based Load Balancing**:
```json
{
"model": "router/dynamic",
"router": {
"name": "fallback-percentage-router",
"type": "fallback",
"targets": [
{
"model": "router/dynamic",
"router": {
"name": "percentage-balanced",
"type": "percentage",
"model_a": [
{ "model": "fireworksai/deepseek-r1", "temperature": 0.7, "max_tokens": 400 },
0.5
],
"model_b": [
{ "model": "deepseek/deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
0.5
]
}
},
{ "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
]
}
}
```
### How This Works:
* **Primary Route:** The system distributes requests **evenly (50-50%)** between two providers of DeepSeek-R1 to balance the load.
* **Fallback Route:** If both DeepSeek-R1 providers are unavailable or fail, all requests are automatically rerouted to **GPT-4o**, ensuring continuous service.
This approach provides **load balancing, and reliable fallback protection**, making it ideal for AI applications facing high demand and occasional model unavailability.
In more complex scenarios, you can configure a **multi-level fallback system** with percentage-based distribution. This approach allows requests to be routed dynamically based on pricing, performance, or reliability, ensuring efficiency while preventing downtime.
Checkout [Routing Strategies](https://docs.langdb.ai/features/routing#routing-strategies) for more routing strategies.
By leveraging **dynamic routing**, you can:
* **Prevent downtime** by automatically switching to backup models.
* **Optimize performance and cost** with smart load balancing.
* **Ensure scalability** without manual intervention.
With LangDB’s flexible and powerful routing capabilities, **you can build AI applications that are not only intelligent but also robust and fail-safe**.
---
## Connecting LLMs to the Web with Real-Time Search Tools
Traditional LLMs rely on static training data, making them prone to outdated responses, hallucinations, and missing critical context. LangDB's **built-in Search tool** solves this by fetching real-time data, improving accuracy and contextual relevance.
## The Challenge: Stale or Incomplete Knowledge
* **Static Corpus**: Most LLMs are trained on large datasets, but that training is typically a snapshot in time. Once trained, the model doesn’t automatically update its knowledge.
* **Inaccurate or Outdated Information**: Without a method to query current data, an LLM may provide answers that were correct at the time of training but are no longer valid.
* **Limited Context**: Even if the model has relevant data, it might not surface the best context without a guided search mechanism.
## Introducing LangDB Search Tool
LangDB’s built-in Search tool addresses these challenges by allowing real-time querying of databases, documents, or external sources:
* **On-Demand Queries**: Instead of relying solely on the LLM’s training data, the Search tool can fetch the latest information at query time.
* **Integrated with LangDB**: The search functionality is seamlessly woven into the LangDB, ensuring that developers can use it without additional overhead.
* **API-Ready**: LangDB’s search tool can be accessed via API too.
## Search vs No-Search
Below is a side-by-side comparison of using LangDB’s search tool versus relying on static model knowledge. The left image shows results **with search enabled**, pulling real-time, up-to-date information. The right image shows the **same query without search**, leading to more generic and potentially outdated responses.
| **Feature** | **No Search** | **With LangDB Search** |
| ---------------------- | ---------------------------------------- | ------------------------------------------------ |
| **Data Freshness** | Static, based on training corpus | Dynamic, fetches real-time information |
| **Accuracy** | Prone to outdated or incorrect responses | Pulls from latest sources, improving reliability |
| **Context Depth** | Limited by internal model memory | Integrates external sources for better insights |
| **Hallucination Risk** | Higher | Lower, as responses are backed by retrieved data |
## Using Search through API
LangDB’s search tool can be easily integrated via API to fetch real-time data. Below is an example of how to make a simple API call to retrieve live information.
Enable real-time search with LLMs in LangDB with a simple API call:
```bash
curl 'https://api.us-east-1.langdb.ai/{LangDB_ProjectID}/v1/chat/completions' \
-H 'authorization: Bearer LangDBAPIKey' \
-H 'Content-Type: application/json' \
-d '{
"model": "openai/gpt-4o-mini",
"mcp_servers": [{ "name": "websearch", "type": "in-memory"}],
"messages": [
{
"role": "user",
"content": "what are the latest models by anthropic and what are its standout features?"
}
]
}'
```
This allows the LLM to enhance responses with live data, ensuring greater accuracy and relevance.
## Conclusion
LangDB’s built-in Search tool eliminates the limitations of static LLMs by integrating **real-time web search**, ensuring your AI retrieves the most relevant, up-to-date, and accurate information. Whether you're building **chatbots, research tools, or automation systems**, dynamic search enhances responses with verifiable data, reducing hallucinations and improving decision-making.
---
## Tracing Multiple Agents
When working with AI-powered workflows, efficiency, scalability, and cost control are. In many scenarios, a single-agent architecture often fails to meet all operational requirements, leading developers to adopt multi-agent workflows.
LangDB simplifies this by offering seamless tracing, analytics, and optimization tools for multi-agent workflows.
**TLDR,** Combining Gemini and GPT offers lower costs, though it’s slightly slower than using a single model.
| **Tag** | **Total Cost** | **Total Requests** | **Avg Duration (ms)** | **Total Duration (ms)** | **Total Input Tokens** | **Total Output Tokens** | **Avg TTFT (ms)** | **TPS** | **TPOT** |
| ----------- | -------------- | ------------------ | --------------------- | ----------------------- | ---------------------- | ----------------------- | ----------------- | ------- | -------- |
| \[gemini] | 0.037731 | 4 | 2390.0 | 9559.802 | 11197 | 345 | 2390.0 | 1207.35 | 0.03 |
| \[gpt-4o] | 0.057021 | 4 | 4574.9 | 18299.686 | 17235 | 443 | 4574.9 | 966.03 | 0.04 |
| \[combined] | 0.034887 | 4 | 2765.5 | 11061.997 | 10009 | 405 | 2765.5 | 941.42 | 0.03 |
Let’s break down a typical **multi-agent workflow**:
## **Example: Generating a Chart for a given question**
* **User Request**: _"Generate a chart of average temperature in Alaska over the past decade."_
* **Agent Roles**:
1. **Researcher Agent**: Gathers raw temperature data by calling a search tool.
2. **Chart Generator Agent**: Processes the data and creates a visualization using a code execution tool.
Here’s how this workflow is structured across multiple agents.
To evaluate the efficiency of multi-agent workflows, we ran three different setups in LangDB. These scenarios illustrate how using the same model across agents compares to combining models.
## Scenarios We Tested
To evaluate the efficiency of multi-agent workflows, we tested three setups. In the first scenario, both the Researcher Agent and Chart Generator Agent used Gemini, focusing on speed and cost. In the second scenario, both agents used GPT-4o, prioritizing accuracy but incurring higher costs. Finally, in the third scenario, we combined models: the Researcher Agent used Gemini for data gathering, while the Chart Generator Agent used GPT-4o for visualization. This allowed us to balance speed, cost, and accuracy
## Tracing and Analytics with LangDB
After running these scenarios, we used LangDB's tracing capabilities to monitor task routing, response times, and token usage. The tracing interface allowed us to see exactly how tasks were distributed between agents and measure the efficiency of each workflow.
Using LangDB's **analytics**, we evaluated the performance data from these runs to generate the table presented earlier. This included:
* **Cost Analysis**: Tracking how much each agent contributed to the overall expense.
* **Time Analysis**: Measuring the average response time across different workflows.
* **Number of Requests**: Recording how many requests each run processed.
* **Average Time to First Token (TTFT)**: Capturing the time taken for the first token to appear.
* **Tokens Per Second (TPS)**: Analyzing the efficiency of token generation per second.
* **Time Per Output Token (TPOT)**: Evaluating the time taken per output token to assess response efficiency.
| **Tag** | **Total Cost** | **Total Requests** | **Avg Duration (ms)** | **Total Duration (ms)** | **Total Input Tokens** | **Total Output Tokens** | **Avg TTFT (ms)** | **TPS** | **TPOT** |
| ----------- | -------------- | ------------------ | --------------------- | ----------------------- | ---------------------- | ----------------------- | ----------------- | ------- | -------- |
| \[gemini] | 0.037731 | 4 | 2390.0 | 9559.802 | 11197 | 345 | 2390.0 | 1207.35 | 0.03 |
| \[gpt-4o] | 0.057021 | 4 | 4574.9 | 18299.686 | 17235 | 443 | 4574.9 | 966.03 | 0.04 |
| \[combined] | 0.034887 | 4 | 2765.5 | 11061.997 | 10009 | 405 | 2765.5 | 941.42 | 0.03 |
#### Key Takeaways from the Scenarios
**1. Gemini-Only Workflow**
* **Cost**: Slightly higher than combined but cheaper than GPT-4o.
* **Speed**: Fastest average duration (2390 ms) and highest TPS (1207.35).
**2. GPT-4o-Only Workflow**
* **Cost**: Highest overall.
* **Speed**: Slowest duration (4574.9 ms) and lower TPS (966.03).
**3. Combined Workflow**
* **Cost**: Lowest of all workflows.
* **Speed**: Moderate duration (2765.5 ms) and TPS (941.42).
***
## Why Tracing Matters for Multi-Agent Workflows
In a multi-agent workflow, different models handle specialized tasks. Without proper observability, **you don't know how efficiently each model is performing**. This lack of visibility can lead to:
* **Escalating Costs**: Premium models being overused for trivial tasks.
* **Performance Bottlenecks**: Delays caused by slow or inefficient task execution.
* **Operational Inefficiency**: No clear insights on how to optimize workflows.
## Why LangDB is Essential for Multi-Agent Workflows
### With LangDB, you can:
1. **Monitor Key Metrics**: Track costs, latency, number of requests, and token usage for each workflow.
2. **Analyze Performance Trends**: Understand response times, time to first token, and processing speeds across different models.
3. **Gain Detail Insights**: Use detailed analytics to identify inefficiencies and optimize workflows accordingly.
***
## Next Steps
To explore multi-agent workflows with LangDB:
1. Visit the [LangDB Samples](https://github.com/langdb/langdb-samples) Repository for setup instructions and examples.
2. Try out [LangDB](https://app.langdb.ai/) to monitor and optimize your workflows in real time.
3. Experiment with combining different models to achieve the best results for your unique use case.
Explore LangDB’s capabilities by visiting the [LangDB Samples Repository](https://github.com/langdb/langdb-samples) or [trying the platform](https://app.langdb.ai/) for multi-agent tracing and analytics.
---
## Context7 + Sequential Thinking for Smarter Coding Workflows
This use case demonstrates how developers can supercharge coding workflows by combining **Context7** (for live documentation) with **Sequential Thinking** (for structured code planning and debugging) — all inside LLM-native code editors like Cursor or Windsurf.
### This Showcase Demonstrates:
* Using **Context7** to inject accurate, up-to-date code documentation inside your coding prompt.
* Using **Sequential Thinking** to reason step-by-step through logic, architecture, or debugging.
* Integrating both tools to eliminate hallucinations and maintain continuity across coding sessions.
### Why It Matters
* Prevents outdated or incorrect suggestions from LLMs.
* Injects precise documentation without leaving your editor.
* Helps developers break down, plan, and revise complex code logic directly in the IDE.
### How It Works
**Editor Prompt ➔ Context7 Docs ➔ Code Planning with Sequential Thinking**
Your AI editor fetches framework or library documentation using Context7, then enables a structured thought process using Sequential Thinking to guide implementation or debugging.
To enable this workflow, simply add the following MCP servers to your LangDB Virtual MCP:
* `context7` – for live documentation injection
* `sequential-thinking` – for structured reasoning steps
No additional configuration is required.
Checkout MCP Servers:
* Context7: [https://app.langdb.ai/mcp-servers/context7](https://app.langdb.ai/mcp-servers/context7)
* Sequential Thinking: [https://app.langdb.ai/mcp-servers/sequential-thinking](https://app.langdb.ai/mcp-servers/sequential-thinking)
## Interaction Flow
```
I’m working on a SvelteKit app and want to understand how to use afterNavigate from @sveltejs/kit. Can you break down its usage step-by-step — when to use it, how to register it, and what kind of tasks it’s best suited for (like scroll restoration or analytics)? Please include code examples using