1 of 20

Enterprise

Architecture Overview

Explore LangDB’s dedicated tenant architecture with secure metadata storage, real-time observability, cost management, and scalable MCP execution.

This page describes the core architecture of the LangDB AI Gateway, a unified platform for interfacing with a wide variety of Large Language Models (LLMs) and building agentic applications with enterprise-grade observability, cost control, and scalability, MCP features and more.

Core Components

Component

Purpose / Description

Enterprise Features / Notes

AI Gateway

Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing. Free & Open Source version available at .

Multi-tenancy, advanced cost control, and rate limiting. Contact LangDB for access.

Metadata Store (PostgreSQL)

Stores metadata related to API usage, configurations, and more.

For scalable/multi-tenant deployments, use managed PostgreSQL (e.g., AWS RDS, GCP Cloud SQL).

Cache Store

(Redis)

Implements rolling cost control and rate limiting for API usage.

Enterprise version supports Redis integration for cost control and rate limiting.

Observability & Analytics Store (ClickHouse)

Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry.

For large-scale deployments, use ClickHouse Cloud. Traces stored in langdb.traces table.

Note:

Metadata Store: Powered by PostgreSQL (consider AWS RDS, GCP Cloud SQL for enterprise)
Cache Store: Powered by Redis (enterprise only)
Observability & Analytics Store: Powered by ClickHouse (consider ClickHouse Cloud for scale)

Environment Overview

LangDB provisions a dedicated environment for each tenant. This environment is isolated per tenant and is set up in a separate AWS account or GCP project, managed by LangDB. Customers connect securely to their provisioned environment from their own VPCs, ensuring strong network isolation and security.

LangDB itself operates a thin, shared public cloud environment (the "control plane") that is primarily responsible for:

Provisioning new tenant environments
Managing access control and user/tenant provisioning
Handling external federated account connections (e.g., SSO)
Hosting the LangDB Dashboard frontend application for configuration, monitoring, and management

All operational workloads, data storage, and LLM/MCP execution occur within the tenant-specific environment. The shared LangDB cloud is not involved in data processing or LLM execution, but only in provisioning, access management, and dashboard hosting.

Customer Environment

Integrates with customer identity providers (Active Directory, SAML, SSO).
Users (AI Apps, Agents, Administrators, Developers) interact with LangDB via secure endpoints.

LangDB Dashboard

Centralized dashboard for configuration, monitoring, and management.
Handles user and tenant provisioning, access control, and external federated account connections.
All provisioning and access is centrally managed via LangDB Cloud and Dashboard.

Tenant Environment (Execution Layer)

Each tenant (enterprise deployment) is provisioned in a dedicated AWS account or GCP project.
Communication between tenant environment and LangDB is secured and managed.
Provisioning is automated via Terraform.

Store Descriptions

Metadata Store (PostgreSQL)

Stores all configuration and metadata required for operation, including:

Virtual models
Virtual MCP servers
Projects
Guardrails
Routers

Redis (Cache Store)

Used for fast, in-memory operations related to:

Rate limiting & cost control
LLM usage tracking
MCP usage tracking

ClickHouse (Analytics & Observability Store)

Stores analytics and observability data:

Traces (API calls, LLM invocations, etc.)
Metrics (performance, usage, etc.)

User and Tenant Provisioning

User and tenant provisioning is centrally controlled via LangDB Cloud and Dashboard.
External federated accounts (e.g., enterprise SSO) can be connected to LangDB Cloud for seamless access management.

Data Retention

Data retention policies mainly apply to observability data (traces, metrics) stored in ClickHouse.
Retention is enforced per subscription tier; traces are automatically cleared after the retention period expires.

MCP Server Deployment

MCP servers are deployed in a serverless fashion using AWS Lambda or GCP Cloudrun for scalability and cost efficiency.

Enterprise Licensing Options

Learn more about LangDB's Enterprise Options

LangDB offers two enterprise licensing models to fit your organization's needs:

Enterprise Managed

Best for: Development teams and startups running AI workloads in private VPCs who want a fully managed experience.

Deployment: Entire infrastructure is deployed on GCP or AWS and fully managed by LangDB.
- For AWS, an AWS account with shared access will be provisioned.
- For GCP, a new project will be provisioned.
Infrastructure: Fully Managed.

Enterprise Flexible

Best for: Enterprises running large-scale AI operations who want maximum flexibility and control.

Deployment: LangDB provides a highly performant binary that can be deployed in your own infrastructure (on-prem, cloud, or hybrid).
- x86_64, aarch64 for Ubuntu
Infrastructure: Bring your own

Solutioning Add-On: Available at an hourly rate if needed.

For details or clarification, please book a meeting on our website.

Running Locally

Self-host LangDB AI Gateway Enterprise locally with ClickHouse, PostgreSQL, and Redis for full control over tracing, caching, and analytics.

Dependencies

ClickHouse (for request tracing & analytics)
PostgreSQL (for metadata and user management)
Redis (for caching and rate‑limiting)

Launch Options

You can self host our enterprise version using two options.

Using binary

Supported Platforms:

x86_64
aarch64

Using docker

Make Your First Request

Test the gateway with a simple chat completion:

Using MCPs Servers

Invoke an MCP server alongside your request:

Next Steps:

Refer to for understanding various features.

ai-gateway.yaml

Use YAML config to securely define storage, networking, and service settings for your LangDB AI Gateway deployments.

ai-gateway.yaml is the primary way to configure secrets and specific features of AI Gateway.

# Sample ai-gateway.yaml

# Refer to the ai-gateway.yaml for
# advanced configurations available 
# ....

# Configuration for storage
database_config:
  url: "postgres://langdb:XXXX@localhost:5438/postgres"
redis_config:
  url: "redis://localhost:6379"
langdb_clickhouse_config:
  url: localhost:8123
  
rest_api:
  port: 8083
  host: 0.0.0.0
  cors: true


## Configure storage location
# storage_config: !Local "file:///<storage-path>"

langdb_cloud_ui: 
  url: http://localhost:3000

Tenant & User Provisioning

Set up tenants and users in LangDB via direct signup or federated SSO, with dedicated infrastructure for enterprise deployments.

Tenant Provision happens through LangDB dashboard where you can register for a company, upgrade to Enterprise License.

Reach out to our support staff for configuring your tenant environment using AWS or GCP.
For self hosted enterprise versions, you ll be requested for discovery URLs to be registered with LangDB Control Environment.

Tenancy

The main LangDB Cloud is multi-tenant, with shared infrastructure for all tenants.
Enterprise deployments are provisioned per tenant, with dedicated infrastructure and network isolation.
Provisioning an individual tenant involves setting up an entire AWS account or GCP project per tenant, managed via Terraform, which then communicates securely with LangDB Cloud.

User Setup

Two types of user modes are supported.

Direct User Setup

Users can sign up directly and invite additional users to their tenant.
Easiest to setup.
You can restrict signups to specific email domains for security.

Federated User setup( SSO / SAML / OpenID )

Reach out to us for linking your federated account to your tenant.
Users that are part of the directory can register on the sub-domain.
Currently roles are managed through LangDB dashboard.
Dynamic Role Mapping feature is in active development.

Routing Engine (Enterprise Only)

Use JSON rules or an embedded script to manage AI models, cut costs, and boost performance. A guide for enterprise-level routing.

LangDB Routing enables organisations to control how user requests are handled by AI models, optimizing for cost, performance, compliance, and user experience. By defining routing rules in JSON or through an embedded script, businesses can automate decision-making, ensure reliability, and maximize value from their AI investments.

What is Routing in LangDB?

Routing in LangDB is the process of directing each user request to the most appropriate AI model or service, based on business logic, user profile, request content, or real-time system metrics. This enables:

Cost savings by using cheaper models for non-critical tasks
Performance optimization by routing to the fastest or most reliable models
Compliance by enforcing data residency or content policies
Personalization by serving different user segments with tailored models

Example Use Cases

Enterprise Use Case

Business Goal

Key Variables & Metrics

Routing Logic Summary

SLA-Driven Tiering

Guarantee premium performance for high-value customers.

user_tier, ttft_ms

Route user_tier: "premium" to models with the lowest ttft_ms.

Geographic Compliance

Ensure data sovereignty and meet regulatory requirements (e.g., GDPR).

region, compliance_tags

If region: "EU", route to models with compliance_tags: "GDPR".

Intelligent Cost Management

Reduce operational expenses for internal or low-priority tasks.

user_group, price

If user_group: "internal", sort available models by price: MIN.

Model A/B Testing

Evaluate new AI models on a subset of live traffic before full rollout.

user_id (hashed), percentage split

Route 10% of traffic to a new model and 90% to the current default.

Content-Aware Routing

Improve accuracy by using specialized models for specific topics.

semantic_guardrail.result.topic

If topic: "finance", route to a finance-tuned model.

Brand Safety Enforcement

Prevent brand damage by blocking or redirecting inappropriate content.

toxicity_guardrail.result

If toxicity_score > 0.8, block the request or route to a safe-reply model.

Checkout the full examples below:

Routing Rule Anatomy

JSON Field

Meaning

Example Value/Usage

Business Impact

routes

List of routing rules

[ ... ]

Controls all routing logic

name

Name of the rule

"vip_fast_lane"

For audit, clarity, and reporting

conditions

When to apply this rule

{ "user_tier": { "eq": "VIP" } }

Target specific users, topics, etc.

targets

Which model(s) to use if rule matches

{ "$any": [ ... ] }

Pool of models for flexibility

$any

Pool of models to choose from

["openai/gpt-4.5", ... ]

Enables failover and optimization

sort

How to pick the best model

{ "price": "MIN" }

Optimize for cost, speed, or reliability

pre_request

Checks/enrichments before routing

[ ... ]

Add business logic, compliance, etc.

post_response

Actions after model response

[ ... ]

Moderate, fallback, or redact responses

message_mapper

Modify request/response

{ ... }

Customizes user experience

Additional Features

Script-Based Routing for Advanced Users

For advanced users, LangDB supports script-based routing using inline WASM (WebAssembly) scripts. These scripts can access all the same variables and metrics as JSON rules, enabling highly flexible, programmable routing logic for complex enterprise needs.

Router Metrics & Observability

All router metrics (including routing decisions, latencies, error rates, and more) are available via OpenTelemetry and can be exported to your observability or monitoring stack for real-time analytics and alerting.

Best Practices

Start simple: Begin with clear, high-value rules (e.g., VIP fast lane, cost control)
Use metrics: Leverage available metrics to optimize for business goals
Test and iterate: Monitor outcomes and refine rules for better results
Document rules: Use the name field and comments for clarity
Plan for compliance: Use region and content checks for regulatory needs
Monitor costs: Use token-based optimization to control spend

Example: Building an Enterprise Routing Configuration

This example demonstrates a multi-layered routing strategy for a SaaS company that balances performance for premium users, cost for standard users, and flexibility for internal development.

Goals:

Provide the fastest possible responses for "premium" customers on support-related queries.
Minimize costs for "standard" tier users.
Allow the internal "development" team to test a new, experimental model without affecting customers.

Routing Configuration (router.json):

Configuration Breakdown:

Rule 1: premium_support_fast_track
- Conditions: This rule applies only when a request comes from a user in the "premium" tier AND the request topic has been identified as "support". This uses an all operator to combine conditions.
- Targets: It routes the request to a pool of high-performance models (anthropic/claude-4-opus, openai/gpt-o3) and selects the one with the lowest time-to-first-token (ttft), ensuring the fastest response.
Rule 2: standard_user_cost_optimized
- Conditions: This is a broader rule that catches all requests from "standard" tier users.
- Targets: It uses a pool of cost-effective models (mistral/mistral-large-latest, anthropic/claude-4-sonnet ) and selects the one with the minimum price, optimizing for spend.
Rule 3: internal_dev_testing
- Conditions: This rule applies to any user in the "development" group.
- Targets: It directs their requests to google/gemini-2.5-pro, isolating test traffic from the production user base.

Example: Routing with Interceptors and Compliance

This example showcases a sophisticated routing configuration that uses pre-request interceptors to enforce usage quotas and guardrails, while handling region-specific compliance and prioritizing performance for premium users.

Goals:

Enforce a daily rate limit on all users to prevent abuse.
Check all requests for policy violations using a semantic guardrail.
Provide high-performance models for premium users in the EU, but only if they are reliable.
Ensure GDPR compliance by using a specialized model for requests with that requirement.
Provide a clear error message to users who have exceeded their quota.

Routing Configuration (router.json):

Configuration Breakdown:

Interceptors: Before any routing rules are evaluated, two interceptors run:
- rate_limiter: Checks if the user has exceeded 1,000 requests today. Its result (true or false) is added to the request context.
- semantic_guardrail: Scans the request for any content that violates policies. (This example assumes it passes). Note that semantic_guardrail is a custom guardrail you would need to create.
Rule 1: premium_eu_high_performance
- Conditions: This rule requires four conditions to be met: the user is "premium", is in the "EU", has not been rate-limited, and the target model provider has a low error_rate.
- Targets: If all conditions pass, it routes to the fastest available high-performance model.
Rule 2: gdpr_compliance_fallback
- Conditions: This rule acts as a high-priority catch-all for any request that requires GDPR compliance, regardless of user tier.
- Targets: It forces the request to a specific, compliant model, ensuring regulatory needs are met.
Rule 3: rate_limit_exceeded_block
- Conditions: This rule checks for the false result from the rate_limiter.
- Targets: Instead of routing to a model, it uses message_mapper to block the request and return a custom error message directly to the user.

Variables & Functions

Available Metrics

LangDB provides a rich set of real-time metrics for making dynamic, data-driven routing decisions. These metrics are aggregated at the provider level, giving you a live view of model performance.

Metric Name

Description

Example Value

Business Value

requests

Total number of requests processed

1500

Monitor traffic and usage patterns

input_tokens

Number of tokens in the request prompt

500

Analyze prompt complexity and cost

output_tokens

Number of tokens in the model response

1200

Track response length and cost

total_tokens

Sum of input and output tokens

1700

Get a complete picture of token usage

latency

Average end-to-end response time (ms)

1100

Route based on overall performance

ttft

Time to First Token (ms)

450

Optimize for user-perceived speed

llm_usage

Aggregated cost of LLM usage

2.54

Track and control spend in real-time

tps

Tokens Per Second (output_tokens/latency)

300

Measure model generation speed

error_rate

Fraction of failed requests

0.01

Route around unreliable models

Available Variables

Variables provide contextual information from the incoming request and user metadata. Unlike metrics, they are not performance indicators but are essential for conditional logic.

Request Information

Variable

Description

Example Value

Business Use

ip

IP address of the requester

"203.0.113.42"

Geo-fencing, fraud detection

region

Geographical region of request

"EU"

Data residency, compliance

user_agent

Client application

"Google ADK/ CrewAI"

Agentic Library used

User Information

Variable

Description

Example Value

Business Use

user_id

Unique user identifier

"u-12345"

Auditing, per-user quotas

user_tier

User segment (e.g., premium, free)

"premium"

Personalization, rate limiting, SLAs

group

User group or segment

"beta_testers"

Feature rollout, A/B testing

region

Geographical region of request

"EU"

Data residency, compliance

Provider Metadata

Variable

Description

Example Value

Business Use

model_family

Model family or provider

"openai/gpt-4"

Brand preference, compliance

capabilities

Supported features (vision, code)

["vision", "code"]

Feature-based routing

compliance_tags

Regulatory/compliance attributes

["GDPR", "HIPAA"]

Regulatory routing

Optimisation Functions

Function

What It Does

When to Use

Example JSON Usage

sort: { price: MIN }

Picks the cheapest model

Cost control, bulk/low-priority tasks

"sort": { "price": "MIN" }

sort: { ttft_ms: MIN }

Picks the fastest model (latency)

VIP, real-time, or user-facing tasks

"sort": { "ttft": "MIN" }

sort: { error_rate: MIN }

Picks the most reliable model

Mission-critical or regulated workflows

"sort": { "error_rate": "MIN" }

sort: { token: MIN }

(Token-based) Picks model with lowest token cost

Optimize for token spend (current)

"sort": { "token": "MIN" }

Note: Currently, token-based optimization is available and recommended for controlling LLM costs.

Interceptors & Guardrails

Interceptors and Guardrails

Interceptors are custom logic that can run before or after a request is routed, allowing you to enrich, validate, or transform requests and responses. Guardrails are a common type of interceptor used to enforce policies.

Type

Purpose

Business Value

Example Use Case/Config

Pre-request

Analyze or enrich request

Classify topic, check for risk, personalize

semantic_guardrail, toxicity_guardrail

Post-request

Analyze or modify response

Moderate output, add fallback, redact sensitive info

fallback_response

When an interceptor runs, it can inject its results into the routing context, making them available for your conditional logic.

Result Variable

Description

Example Value

Business Use

semantic_guardrail.result.topic

Detected topic from a guardrail

"billing"

Route to topic-specialized models

toxicity_guardrail.result

Toxicity score from a guardrail

0.8

Block or reroute harmful content

rate_limiter.result

Result of a rate limit check

true

Enforce usage quotas and prevent abuse

Note on Guardrails: Guardrails like semantic_guardrail and toxicity_guardrail are powerful examples of custom guardrails. Checkout section for more details.

Deployment Options

Using Docker Compose

Launch LangDB locally via Docker Compose for fast development, and explore AWS or GCP guides for scalable cloud deployments.

This guide walks through a simple deployment using Docker Compose. For scalable cloud-native deployments, see our and guides.

docker-compose.yaml

Refer to for configuring ai-gateway.

Next Steps

For cloud-scale deployment, see and.
For full observability and ClickHouse tracing, refer to the .

Deploying on AWS Cloud

Deploy LangDB on AWS with high availability, full observability, VPC isolation, and managed services for scalable, secure AI operations.

AWS Deployment

This section describes how AI Gateway and its supporting components on AWS, ensuring enterprise-grade scalability, observability, and security.

Software Components

Component

Purpose / Description

AWS Service

Scaling

Architecture Overview

The LangDB service is deployed in AWS with a robust, scalable architecture designed for high availability, security, and performance. The system is built using AWS managed services to minimize operational overhead while maintaining full control over the application.

Components

Networking and Entry Points

AWS Region: All resources are deployed within a single AWS region for low-latency communication
VPC: A dedicated Virtual Private Cloud isolates the application resources
ALB: Application Load Balancer serves as the entry point, routing requests from https://api.{region}.langdb.ai to the appropriate services

Core Services

ECS Cluster: Container orchestration for the LangDB service
- Multiple LangDB service instances distributed across availability zones for redundancy
- Auto-scaling capabilities based on load
- Containerized deployment for consistency across environments

Data Storage

RDS: Managed relational database service for persistent storage. Dedicated storage for metadata
ElastiCache (Redis) Cluster: In-memory caching layer
- Used for cache and cost control
- Multiple nodes for high availability
Clickhouse Cloud: Analytics database for high-performance data processing
- Deployed in the same AWS region but outside the VPC
- Managed service for analytical queries and data warehousing

Authentication & Security

Cognito: User authentication and identity management
Lambda: Serverless functions for authentication workflows
SES: Simple Email Service for email communications related to authentication

Secrets Management

Secrets Vault: AWS Secrets Manager for secure storage of
- Provider keys
- Other sensitive credentials

Data Flow

Client requests hit the ALB via https://api.{region}.langdb.ai
ALB routes requests to available LangDB service instances in the ECS cluster
LangDB services interact with:
- RDS for persistent data
- ElastiCache for caching, cost control and rate limiting
- Metadata Storage for metadata operations
- Clickhouse Cloud for analytics and data warehousing
Authentication is handled through Cognito, with Lambda functions for custom authentication flows
Sensitive information is securely retrieved from Secrets Manager as needed

Security Considerations

All components except Clickhouse Cloud are contained within a VPC for network isolation
Secure connections to Clickhouse Cloud are established from within the VPC
Authentication is managed through AWS Cognito
Secrets are stored in AWS Secrets Manager
All communication between services uses encryption in transit

Operational Benefits

Scalability: The architecture supports horizontal scaling of LangDB service instances
High Availability: Multiple instances across availability zones
Managed Services: Leveraging AWS managed services reduces operational overhead

Deployment Process

LangDB infrastructure is deployed and managed using Terraform, providing infrastructure-as-code capabilities with the following benefits:

Terraform Architecture

Modular Structure: The deployment code is organized into reusable Terraform modules that encapsulate specific infrastructure components (networking, compute, storage, etc.)
Environment-Specific Variables: Using .tfvars files to manage environment-specific configurations (dev, staging, prod)
State Management: Terraform state is stored remotely to enable collaboration and version control

Deployment Workflow

Configuration Management: Environment-specific variables are defined in .tfvars files
Resource Provisioning: Terraform creates and configures all AWS resources, including:
- VPC and networking components
- Fargate instances and container configurations
- Postgres databases and Redis clusters
- Authentication services and Lambda functions
- Secrets Manager entries and access controls
Dependency Management: Terraform handles resource dependencies, ensuring proper creation order

Maintenance & Updates

Ongoing infrastructure maintenance is managed through Terraform:

Scaling resources up/down based on demand
Applying security patches and updates
Modifying configurations for performance optimization
Adding new resources or services as needed

Using Kubernetes (Beta)

Deploy LangDB AI Gateway on Kubernetes using Helm, connecting to external Postgres, Redis, and ClickHouse for scalable, cloud-native operations.

Work in Progress.

This guide walks you through deploying the LangDB AI Gateway enterp Refer to individual database links for deploying and scaling clusters for postgres, redis and clickhouse.

Checkout the repository

Deploy using Helm

Clone the Repository

This will deploy:

ai-gateway (using the default image)
uses External Postgres, Redis, and ClickHouse

Configure `values.yaml`

This will automatically mount your config.yaml into the container at /app/config.yaml and the ai-gateway will use it on startup.

Deploy Using Helm

Run the following command to install the AI Gateway:

This deploys:

ai-gateway using the default image.
Connections to external Postgres, Redis, and ClickHouse instances.

Accessing AI Gateway

By default, the service is exposed as a ClusterIP. To access it externally, you can port-forward:

Then access the gateway at http://localhost:8080.

Uninstall

To remove the deployment:

Check out the full source repository here:

Deploying on GCP (Beta)

Deploy LangDB AI Gateway on GCP, connecting to external Postgres, Redis, and ClickHouse for scalable, cloud-native operations.

Coming Soon

Resources

Multi Tenancy

Implement tenant-level isolation with LangDB’s robust multitenancy architecture, featuring row policies in ClickHouse and secure metadata controls in Postgres.

This document outlines the multitenancy implementation in LangDB, explaining how data isolation is maintained across different tenants.

Overview

LangDB implements a robust multitenancy model that ensures complete isolation of tenant data while maintaining efficient resource utilization. This approach is implemented across different data storage systems used in the platform.

Clickhouse (Observability)

Clickhouse is used for analytics and observability in LangDB. The multitenancy implementation in Clickhouse includes:

Custom Role and User for Every Tenant

Each tenant in LangDB has a dedicated Clickhouse user and role
These custom roles enforce access permissions specific to the tenant's data
Authentication and authorization are managed at the tenant level
Prevents cross-tenant data access even at the database level

Row Policy Based Tenant Isolation

All read operations in Clickhouse are governed by row policies
Row policies filter data based on the tenant_name column
When a tenant's credentials are used for database access, the row policy automatically restricts results to only that tenant's data
This provides a zero-trust isolation model where the application doesn't need to include tenant filters

Controlled Insert Operations

All inserts into Clickhouse tables automatically populate the tenant column
The tenant column is populated based on the authenticated user context
Direct inserts by tenants are not allowed, preventing potential data integrity issues
Insert operations are performed via service accounts with appropriate tenant context

Postgres (Metadata)

Postgres is used as the primary metadata storage in LangDB. The multitenancy implementation in Postgres includes:

Application-Level Tenant Isolation

Tenant isolation is implemented at the application logic level
All database queries include tenant-specific filters
Application code ensures that queries only return records belonging to the authenticated tenant
Modifications are restricted to only the tenant's own data through application context

Metadata Security Measures

Tenant identifier is a required column in all tenant-specific tables
All database operations include tenant context validation
Application middleware enforces tenant context for every database operation

Implementation Across Environments

This multitenancy model is consistently implemented across Langdb's AWS and GCP deployments, ensuring that tenant data remains securely isolated regardless of the cloud provider.

Configuring Data Retention

Control trace data retention in LangDB with scalable, cost-effective strategies using ClickHouse background TTL processes and tiered materialized views.

Overview

This document outlines LangDB's data retention strategy for tracing information stored in ClickHouse. The strategy employs materialized views to manage data retention periods based on user subscription tiers efficiently. Data eviction is implemented using ClickHouse's TTL (Time-To-Live) mechanisms and background processes:

TTL Definitions: Each table includes TTL expressions that specify when data should expire based on timestamp fields
Background Merge Process: ClickHouse automatically runs background processes that merge data parts and remove expired data during these merge operations
Resource-Efficient: The eviction process runs asynchronously during system low-load periods, minimizing impact on query performance

Tracing Data Architecture

LangDB uses a robust system for storing and analyzing trace data:

Primary Storage: All trace data is initially stored in the langdb.traces table in ClickHouse
Materialized Views: Tier-specific materialized views filter and retain data based on user subscription levels
Retention Policies: Automated TTL (Time-To-Live) mechanisms enforce retention periods

Implementation using Materialized Views

Tier-Specific Materialized Views

Professional Tier View

Enterprise Tier View

Data Access Flow

New trace data is inserted into the base langdb.traces table
Materialized views automatically filter and copy relevant data to tier-specific tables
TTL mechanisms automatically remove data older than the specified retention period
Data access APIs query the appropriate table based on the user's subscription tier

Benefits of This Approach

Efficiency: Only store data for the period necessary based on customer tier
Performance: Queries run against smaller, tier-specific tables rather than the entire dataset
Compliance: Clear retention boundaries help with regulatory compliance
Cost-Effective: Optimizes storage costs by aligning retention with customer value

Backup and Disaster Recovery

While the retention strategy focuses on operational access to trace data, a separate backup strategy ensures data can be recovered in case of system failures:

Daily snapshots of ClickHouse data
Backup retention aligned with the longest tier retention period (365 days)
Geo-redundant storage of backups

Monitoring and Management

The retention system includes:

Monitoring dashboards for data volume by tier
Alerts for unexpected growth or retention failures
Regular audits to ensure compliance with retention policies

Future Enhancements

Implementation of custom retention periods for specific enterprise customers
Cold storage options for extended archival needs
Advanced sampling techniques to retain representative trace data beyond standard periods

Clickhouse Queries

Track and query LangDB traces with ClickHouse, using bloom filter indexes for fast thread and run-specific analytics.

Overview

Clickhouse is used for observability in LangDB. It provides high-performance analytics capabilities that allow us to track and analyze system behavior, performance metrics, and user activities across the platform.

Table Schemas

Traces Table

The following create query represents thetracestable that stores distributed tracing information of Langdb AI Gateway.

CREATE TABLE IF NOT EXISTS langdb.traces
(
    trace_id        UUID,
    span_id         UInt64,
    parent_span_id  UInt64,
    operation_name  LowCardinality(String),
    kind            String,
    start_time_us   UInt64,
    finish_time_us  UInt64,
    finish_date     Date,
    attribute       Map(String, String),
    tenant_id       Nullable(String),
    project_id      String,
    thread_id       String,
    tags            Map(String, String),
    parent_trace_id Nullable(UUID),
    run_id          Nullable(UUID)
)
ENGINE = MergeTree
ORDER BY (finish_date, finish_time_us, trace_id)
SETTINGS index_granularity = 8192;

-- Add bloom filter index for thread_id
ALTER TABLE langdb.traces ADD INDEX idx_thread_id thread_id TYPE bloom_filter GRANULARITY 4;

-- Add composite index for tenant_id, project_id, and operation_name
ALTER TABLE langdb.traces ADD INDEX idx_tenant_projec

Common Filters

thread_id field with its dedicated bloom filter index allows for efficient filtering of traces based on specific execution threads.
The run_id field enables filtering and grouping traces by specific execution runs.

Working with Models

Manage public and private LLM models in LangDB with flexible APIs, custom parameter schemas, and dynamic request/response mapping.

Managing and Adding New Models

The platform provides a flexible management API that allows you to publish and manage machine learning models at both the tenant and project levels. This enables organizations to control access and visibility of models, supporting both public and private use cases.

Model Types

Public Models:
- Can be added without specifying a project ID.
- Accessible to all users on this deployment.
- In enterprise deployments, public models are added monthly.
- Specific requests for models can be made talking to our support.
Private Models:
- Require a project_id and a provider_id for a known provider with a pre-configured secret.
- Access is restricted to the specified project and provider.

Model Parameters

When publishing a model, you can specify:

Request/Response Mapping:
- By default, models are expected to be OpenAI-compatible.
- You can also specify custom request/response processors using dynamic scripts (see 'Coming Soon' below).
Model Parameters Schema:
- A JSON schema describing the parameters that can be sent with requests to the model.

Management API to Publish New Models

The management API allows you to register new models with the platform. Below is an example of how to use the API to publish a new model.

Sample cURL Request

curl -X POST https://api.xxx.langdb.ai/admin/models \
  -H "Authorization: Bearer <your_token>" \
  -H "X-Admin-Key: <admin_key>"\
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-model",
    "description" "Description of the LLM Model",
    "provider_id": "123e4567-e89b-12d3-a456-426614174000",
    "project_id": "<project_id>",
    "public": false,
    "request_response_mapping": "openai-compatible", // or custom script
    "model_type": "completions",
    "input_token_price": "0.00001",
    "output_token_price": "0.00003",
    "context_size": 128000,
    "capabilities": ["tools"],
    "input_types": ["text", "image"],
    "output_types": ["text", "image"],
    "tags": [],
    "owner_name": "openai",
    "priority": 0,
    "model_name_in_provider": "my-model-v1.2",
    "parameters": {
       "top_k":{
          "min":0,
          "step":1,
          "type":"int",
          "default":0,
          "required":false,
          "description":"Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens."
       },
       "top_p":{
          "max":1,
          "min":0,
          "step":0.05,
          "type":"float",
          "default":1,
          "required":false,
          "description":"An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both."
       },
    }
  }'

API properties

Field

Type

Description

model_name

String

The display name of the model

description

String

A detailed description of the model's capabilities and use cases

provider_info_id

UUID

The UUID of the provider that offers this model

project_id

UUID

Which project this model belongs to

public

Boolean

Whether the model is publicly discoverable or private

request_response_mapping

String

"openai-compatible" or a custom mapping script

model_type

String

The type of model (e.g., "completions", "image", "embedding")

owner_name

String

The name of the model's owner or creator

priority

i32

Priority level for the model in listings (higher numbers indicate higher priority)

input_token_price

Nullable float

Price per input token

output_token_price

Nullable float

Price per output token

context_size

Nullable u32

Maximum context window size in tokens

capabilities

String[]

List of model capabilities (e.g., "tools")

input_types

String[]

Supported input formats (e.g., "text", "image", "audio")

output_types

String[]

Supported output formats (e.g., "text", "image", "audio")

tags

String[]

Classification tags for the model

type_prices

Map<String, float>

JSON string containing prices for different usage types (used for image generation model pricing)

mp_price

Nullable float

Price by megapixel (used for image generation model pricing)

model_name_in_provider

String

The model's identifier in the provider's system

parameters

Map<String, Map<String, any>>

Additional configuration parameters as JSON

Checkout the full API Specification:

Usage of API

Replace <platform-url>, <your_token>, and <project_id> with your actual values.
Set public to true for public models (omit project_id and provider_id), or false for private models.
The parameters_schema field allows you to define the expected parameters for your model.

Dynamic Request/Response Mapping (Coming Soon)

The platform will soon support dynamic request/response mapping using custom scripts. This feature will allow you to define how requests are transformed before being sent to the model, and how responses are processed before being returned to the client. This will enable support for a wide variety of model APIs and custom workflows.

Stay tuned!

Working with Models

Manage public and private LLM models in LangDB with flexible APIs, custom parameter schemas, and dynamic request/response mapping.

Managing and Adding New Models

Model Types

Public Models:
- Can be added without specifying a project ID.
- Accessible to all users on this deployment.
- In enterprise deployments, public models are added monthly.
- Specific requests for models can be made talking to our support.
Private Models:
- Require a project_id and a provider_id for a known provider with a pre-configured secret.
- Access is restricted to the specified project and provider.

Model Parameters

When publishing a model, you can specify:

Request/Response Mapping:
- By default, models are expected to be OpenAI-compatible.
- You can also specify custom request/response processors using dynamic scripts (see 'Coming Soon' below).
Model Parameters Schema:
- A JSON schema describing the parameters that can be sent with requests to the model.

Management API to Publish New Models

The management API allows you to register new models with the platform. Below is an example of how to use the API to publish a new model.

Sample cURL Request

curl -X POST https://api.xxx.langdb.ai/admin/models \
  -H "Authorization: Bearer <your_token>" \
  -H "X-Admin-Key: <admin_key>"\
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-model",
    "description" "Description of the LLM Model",
    "provider_id": "123e4567-e89b-12d3-a456-426614174000",
    "project_id": "<project_id>",
    "public": false,
    "request_response_mapping": "openai-compatible", // or custom script
    "model_type": "completions",
    "input_token_price": "0.00001",
    "output_token_price": "0.00003",
    "context_size": 128000,
    "capabilities": ["tools"],
    "input_types": ["text", "image"],
    "output_types": ["text", "image"],
    "tags": [],
    "owner_name": "openai",
    "priority": 0,
    "model_name_in_provider": "my-model-v1.2",
    "parameters": {
       "top_k":{
          "min":0,
          "step":1,
          "type":"int",
          "default":0,
          "required":false,
          "description":"Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens."
       },
       "top_p":{
          "max":1,
          "min":0,
          "step":0.05,
          "type":"float",
          "default":1,
          "required":false,
          "description":"An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both."
       },
    }
  }'

API properties

Field

Type

Description

model_name

String

The display name of the model

description

String

A detailed description of the model's capabilities and use cases

provider_info_id

UUID

The UUID of the provider that offers this model

project_id

UUID

Which project this model belongs to

public

Boolean

Whether the model is publicly discoverable or private

request_response_mapping

String

"openai-compatible" or a custom mapping script

model_type

String

The type of model (e.g., "completions", "image", "embedding")

owner_name

String

The name of the model's owner or creator

priority

i32

Priority level for the model in listings (higher numbers indicate higher priority)

input_token_price

Nullable float

Price per input token

output_token_price

Nullable float

Price per output token

context_size

Nullable u32

Maximum context window size in tokens

capabilities

String[]

List of model capabilities (e.g., "tools")

input_types

String[]

Supported input formats (e.g., "text", "image", "audio")

output_types

String[]

Supported output formats (e.g., "text", "image", "audio")

tags

String[]

Classification tags for the model

type_prices

Map<String, float>

JSON string containing prices for different usage types (used for image generation model pricing)

mp_price

Nullable float

Price by megapixel (used for image generation model pricing)

model_name_in_provider

String

The model's identifier in the provider's system

parameters

Map<String, Map<String, any>>

Additional configuration parameters as JSON

Checkout the full API Specification:

Usage of API

Replace <platform-url>, <your_token>, and <project_id> with your actual values.
Set public to true for public models (omit project_id and provider_id), or false for private models.
The parameters_schema field allows you to define the expected parameters for your model.

Dynamic Request/Response Mapping (Coming Soon)

Stay tuned!

Enterprise

Architecture Overview

Core Components

Environment Overview

Customer Environment

LangDB Dashboard

Tenant Environment (Execution Layer)

Store Descriptions

Metadata Store (PostgreSQL)

Redis (Cache Store)

ClickHouse (Analytics & Observability Store)

User and Tenant Provisioning

Data Retention

MCP Server Deployment

Enterprise Licensing Options

Enterprise Managed

Enterprise Flexible

Running Locally

Dependencies

Launch Options

Using binary

Using docker

Make Your First Request

Using MCPs Servers

Next Steps:

ai-gateway.yaml

# Sample ai-gateway.yaml

Tenant & User Provisioning

Tenancy

User Setup

Direct User Setup

Federated User setup( SSO / SAML / OpenID )

Routing Engine (Enterprise Only)

What is Routing in LangDB?

Example Use Cases

Routing Rule Anatomy

Additional Features

Script-Based Routing for Advanced Users

Router Metrics & Observability

Best Practices

Example: Building an Enterprise Routing Configuration

Configuration Breakdown:

Example: Routing with Interceptors and Compliance

Variables & Functions

Available Metrics

Available Variables

Request Information

User Information

Provider Metadata

Optimisation Functions

Interceptors & Guardrails

Interceptors and Guardrails

Deployment Options

Using Docker Compose

docker-compose.yaml

Next Steps

Deploying on AWS Cloud

AWS Deployment

Software Components

Architecture Overview

Components

Networking and Entry Points

Core Services

Data Storage

Authentication & Security

Secrets Management

Data Flow

Security Considerations

Operational Benefits

Deployment Process

Terraform Architecture

Deployment Workflow

Maintenance & Updates

Using Kubernetes (Beta)

Deploy using Helm

Clone the Repository

Configure values.yaml

Deploy Using Helm

Accessing AI Gateway

Uninstall

Configure `values.yaml`

Configure `values.yaml`