Only this pageAll pages
Powered by GitBook
1 of 20

Enterprise

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Deployment Options

Loading...

Loading...

Loading...

Loading...

Resources

Loading...

Loading...

Loading...

Loading...

Enterprise Licensing Options

Learn more about LangDB's Enterprise Options

LangDB offers two enterprise licensing models to fit your organization's needs:


Enterprise Managed

Best for: Development teams and startups running AI workloads in private VPCs who want a fully managed experience.

  • Deployment: Entire infrastructure is deployed on GCP or AWS and fully managed by LangDB.

    • For AWS, an AWS account with shared access will be provisioned.

    • For GCP, a new project will be provisioned.

  • Infrastructure: Fully Managed.


Enterprise Flexible

Best for: Enterprises running large-scale AI operations who want maximum flexibility and control.

  • Deployment: LangDB provides a highly performant binary that can be deployed in your own infrastructure (on-prem, cloud, or hybrid).

    • x86_64, aarch64 for Ubuntu

  • Infrastructure: Bring your own


Solutioning Add-On: Available at an hourly rate if needed.

For details or clarification, please book a meeting on our website.

Architecture Overview

Explore LangDB’s dedicated tenant architecture with secure metadata storage, real-time observability, cost management, and scalable MCP execution.

This page describes the core architecture of the LangDB AI Gateway, a unified platform for interfacing with a wide variety of Large Language Models (LLMs) and building agentic applications with enterprise-grade observability, cost control, and scalability, MCP features and more.

Core Components

Component
Purpose / Description
Enterprise Features / Notes

AI Gateway

Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing. Free & Open Source version available at .

Multi-tenancy, advanced cost control, and rate limiting. Contact LangDB for access.

Metadata Store (PostgreSQL)

Stores metadata related to API usage, configurations, and more.

For scalable/multi-tenant deployments, use managed PostgreSQL (e.g., AWS RDS, GCP Cloud SQL).

Cache Store

(Redis)

Implements rolling cost control and rate limiting for API usage.

Enterprise version supports Redis integration for cost control and rate limiting.

Observability & Analytics Store (ClickHouse)

Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry.

For large-scale deployments, use ClickHouse Cloud. Traces stored in langdb.traces table.

Note:

  • Metadata Store: Powered by PostgreSQL (consider AWS RDS, GCP Cloud SQL for enterprise)

  • Cache Store: Powered by Redis (enterprise only)

  • Observability & Analytics Store: Powered by ClickHouse (consider ClickHouse Cloud for scale)

Environment Overview

LangDB provisions a dedicated environment for each tenant. This environment is isolated per tenant and is set up in a separate AWS account or GCP project, managed by LangDB. Customers connect securely to their provisioned environment from their own VPCs, ensuring strong network isolation and security.

LangDB itself operates a thin, shared public cloud environment (the "control plane") that is primarily responsible for:

  • Provisioning new tenant environments

  • Managing access control and user/tenant provisioning

  • Handling external federated account connections (e.g., SSO)

  • Hosting the LangDB Dashboard frontend application for configuration, monitoring, and management

All operational workloads, data storage, and LLM/MCP execution occur within the tenant-specific environment. The shared LangDB cloud is not involved in data processing or LLM execution, but only in provisioning, access management, and dashboard hosting.

Customer Environment

  • Integrates with customer identity providers (Active Directory, SAML, SSO).

  • Users (AI Apps, Agents, Administrators, Developers) interact with LangDB via secure endpoints.

LangDB Dashboard

  • Centralized dashboard for configuration, monitoring, and management.

  • Handles user and tenant provisioning, access control, and external federated account connections.

  • All provisioning and access is centrally managed via LangDB Cloud and Dashboard.

Tenant Environment (Execution Layer)

  • Each tenant (enterprise deployment) is provisioned in a dedicated AWS account or GCP project.

  • Communication between tenant environment and LangDB is secured and managed.

  • Provisioning is automated via Terraform.


Store Descriptions

Metadata Store (PostgreSQL)

Stores all configuration and metadata required for operation, including:

  • Virtual models

  • Virtual MCP servers

  • Projects

  • Guardrails

  • Routers

Redis (Cache Store)

Used for fast, in-memory operations related to:

  • Rate limiting & cost control

  • LLM usage tracking

  • MCP usage tracking

ClickHouse (Analytics & Observability Store)

Stores analytics and observability data:

  • Traces (API calls, LLM invocations, etc.)

  • Metrics (performance, usage, etc.)


User and Tenant Provisioning

  • User and tenant provisioning is centrally controlled via LangDB Cloud and Dashboard.

  • External federated accounts (e.g., enterprise SSO) can be connected to LangDB Cloud for seamless access management.


Data Retention

  • Data retention policies mainly apply to observability data (traces, metrics) stored in ClickHouse.

  • Retention is enforced per subscription tier; traces are automatically cleared after the retention period expires.


MCP Server Deployment

  • MCP servers are deployed in a serverless fashion using AWS Lambda or GCP Cloudrun for scalability and cost efficiency.

langdb/ai-gateway
Architectural Overview for LangDB

Deploying on GCP (Beta)

Deploy LangDB AI Gateway on GCP, connecting to external Postgres, Redis, and ClickHouse for scalable, cloud-native operations.

Coming Soon

ai-gateway.yaml

Use YAML config to securely define storage, networking, and service settings for your LangDB AI Gateway deployments.

ai-gateway.yaml is the primary way to configure secrets and specific features of AI Gateway.

# Sample ai-gateway.yaml

# Refer to the ai-gateway.yaml for
# advanced configurations available 
# ....

# Configuration for storage
database_config:
  url: "postgres://langdb:XXXX@localhost:5438/postgres"
redis_config:
  url: "redis://localhost:6379"
langdb_clickhouse_config:
  url: localhost:8123
  
rest_api:
  port: 8083
  host: 0.0.0.0
  cors: true


## Configure storage location
# storage_config: !Local "file:///<storage-path>"

langdb_cloud_ui: 
  url: http://localhost:3000

Using Kubernetes (Beta)

Deploy LangDB AI Gateway on Kubernetes using Helm, connecting to external Postgres, Redis, and ClickHouse for scalable, cloud-native operations.

Work in Progress.

This guide walks you through deploying the LangDB AI Gateway enterp Refer to individual database links for deploying and scaling clusters for postgres, redis and clickhouse.

Checkout the repository

Deploy using Helm

Clone the Repository

This will deploy:

  • ai-gateway (using the default image)

  • uses External Postgres, Redis, and ClickHouse

Configure values.yaml

This will automatically mount your config.yaml into the container at /app/config.yaml and the ai-gateway will use it on startup.

Deploy Using Helm

Run the following command to install the AI Gateway:

This deploys:

  • ai-gateway using the default image.

  • Connections to external Postgres, Redis, and ClickHouse instances.

Accessing AI Gateway

By default, the service is exposed as a ClusterIP. To access it externally, you can port-forward:

Then access the gateway at http://localhost:8080.


Uninstall

To remove the deployment:


Check out the full source repository here:

git clone [email protected]:langdb/helm-chart.git
cd helm-chart/helm/ai-gateway
env:
  CLICKHOUSE_HOST: <external-clickhouse-host>
  REDIS_HOST: <external-redis-host>
  POSTGRES_HOST: <external-postgres-host>
  POSTGRES_USER: <your-user>
  POSTGRES_PASSWORD: <your-password>
  POSTGRES_DB: <your-db>
config: |
  # ai-gateway configuration
  http:
    host: "0.0.0.0"
    port: 8080
helm install ai-gateway .
kubectl port-forward svc/ai-gateway 8080:80
helm uninstall ai-gateway
https://github.com/langdb/helm-chart
https://github.com/langdb/helm-chart

Clickhouse Queries

Track and query LangDB traces with ClickHouse, using bloom filter indexes for fast thread and run-specific analytics.

Overview

Clickhouse is used for observability in LangDB. It provides high-performance analytics capabilities that allow us to track and analyze system behavior, performance metrics, and user activities across the platform.

Table Schemas

Traces Table

The following create query represents thetracestable that stores distributed tracing information of Langdb AI Gateway.

CREATE TABLE IF NOT EXISTS langdb.traces
(
    trace_id        UUID,
    span_id         UInt64,
    parent_span_id  UInt64,
    operation_name  LowCardinality(String),
    kind            String,
    start_time_us   UInt64,
    finish_time_us  UInt64,
    finish_date     Date,
    attribute       Map(String, String),
    tenant_id       Nullable(String),
    project_id      String,
    thread_id       String,
    tags            Map(String, String),
    parent_trace_id Nullable(UUID),
    run_id          Nullable(UUID)
)
ENGINE = MergeTree
ORDER BY (finish_date, finish_time_us, trace_id)
SETTINGS index_granularity = 8192;

-- Add bloom filter index for thread_id
ALTER TABLE langdb.traces ADD INDEX idx_thread_id thread_id TYPE bloom_filter GRANULARITY 4;

-- Add composite index for tenant_id, project_id, and operation_name
ALTER TABLE langdb.traces ADD INDEX idx_tenant_projec

Common Filters

  • thread_id field with its dedicated bloom filter index allows for efficient filtering of traces based on specific execution threads.

  • The run_id field enables filtering and grouping traces by specific execution runs.

Multi Tenancy

Implement tenant-level isolation with LangDB’s robust multitenancy architecture, featuring row policies in ClickHouse and secure metadata controls in Postgres.

This document outlines the multitenancy implementation in LangDB, explaining how data isolation is maintained across different tenants.

Overview

LangDB implements a robust multitenancy model that ensures complete isolation of tenant data while maintaining efficient resource utilization. This approach is implemented across different data storage systems used in the platform.

Clickhouse (Observability)

Clickhouse is used for analytics and observability in LangDB. The multitenancy implementation in Clickhouse includes:

Custom Role and User for Every Tenant

  • Each tenant in LangDB has a dedicated Clickhouse user and role

  • These custom roles enforce access permissions specific to the tenant's data

  • Authentication and authorization are managed at the tenant level

  • Prevents cross-tenant data access even at the database level

Row Policy Based Tenant Isolation

  • All read operations in Clickhouse are governed by row policies

  • Row policies filter data based on the tenant_name column

  • When a tenant's credentials are used for database access, the row policy automatically restricts results to only that tenant's data

  • This provides a zero-trust isolation model where the application doesn't need to include tenant filters

Controlled Insert Operations

  • All inserts into Clickhouse tables automatically populate the tenant column

  • The tenant column is populated based on the authenticated user context

  • Direct inserts by tenants are not allowed, preventing potential data integrity issues

  • Insert operations are performed via service accounts with appropriate tenant context

Postgres (Metadata)

Postgres is used as the primary metadata storage in LangDB. The multitenancy implementation in Postgres includes:

Application-Level Tenant Isolation

  • Tenant isolation is implemented at the application logic level

  • All database queries include tenant-specific filters

  • Application code ensures that queries only return records belonging to the authenticated tenant

  • Modifications are restricted to only the tenant's own data through application context

Metadata Security Measures

  • Tenant identifier is a required column in all tenant-specific tables

  • All database operations include tenant context validation

  • Application middleware enforces tenant context for every database operation

Implementation Across Environments

This multitenancy model is consistently implemented across Langdb's AWS and GCP deployments, ensuring that tenant data remains securely isolated regardless of the cloud provider.

Tenant & User Provisioning

Set up tenants and users in LangDB via direct signup or federated SSO, with dedicated infrastructure for enterprise deployments.

Tenant Provision happens through LangDB dashboard where you can register for a company, upgrade to Enterprise License.

  • Reach out to our support staff for configuring your tenant environment using AWS or GCP.

  • For self hosted enterprise versions, you ll be requested for discovery URLs to be registered with LangDB Control Environment.

Tenancy

  • The main LangDB Cloud is multi-tenant, with shared infrastructure for all tenants.

  • Enterprise deployments are provisioned per tenant, with dedicated infrastructure and network isolation.

  • Provisioning an individual tenant involves setting up an entire AWS account or GCP project per tenant, managed via Terraform, which then communicates securely with LangDB Cloud.

User Setup

Two types of user modes are supported.

Direct User Setup

  • Users can sign up directly and invite additional users to their tenant.

  • Easiest to setup.

  • You can restrict signups to specific email domains for security.

Federated User setup( SSO / SAML / OpenID )

  • Reach out to us for linking your federated account to your tenant.

  • Users that are part of the directory can register on the sub-domain.

  • Currently roles are managed through LangDB dashboard.

  • Dynamic Role Mapping feature is in active development.

Example: Building an Enterprise Routing Configuration

This example demonstrates a multi-layered routing strategy for a SaaS company that balances performance for premium users, cost for standard users, and flexibility for internal development.

Goals:

  1. Provide the fastest possible responses for "premium" customers on support-related queries.

  2. Minimize costs for "standard" tier users.

  3. Allow the internal "development" team to test a new, experimental model without affecting customers.

Routing Configuration (router.json):

Configuration Breakdown:

  • Rule 1: premium_support_fast_track

    • Conditions: This rule applies only when a request comes from a user in the "premium" tier AND the request topic has been identified as "support". This uses an all operator to combine conditions.

    • Targets: It routes the request to a pool of high-performance models (anthropic/claude-4-opus, openai/gpt-o3) and selects the one with the lowest time-to-first-token (ttft), ensuring the fastest response.

  • Rule 2: standard_user_cost_optimized

    • Conditions: This is a broader rule that catches all requests from "standard" tier users.

    • Targets: It uses a pool of cost-effective models (mistral/mistral-large-latest, anthropic/claude-4-sonnet ) and selects the one with the minimum price, optimizing for spend.

  • Rule 3: internal_dev_testing

    • Conditions: This rule applies to any user in the "development" group.

    • Targets: It directs their requests to google/gemini-2.5-pro, isolating test traffic from the production user base.

Running Locally

Self-host LangDB AI Gateway Enterprise locally with ClickHouse, PostgreSQL, and Redis for full control over tracing, caching, and analytics.

Dependencies

  • ClickHouse (for request tracing & analytics)

  • PostgreSQL (for metadata and user management)

  • Redis (for caching and rate‑limiting)

Launch Options

You can self host our enterprise version using two options.

Using binary

Supported Platforms:

  • x86_64

  • aarch64

Using docker

Make Your First Request

Test the gateway with a simple chat completion:

Using MCPs Servers

Invoke an MCP server alongside your request:

Next Steps:

Refer to for understanding various features.

Using Docker Compose

Launch LangDB locally via Docker Compose for fast development, and explore AWS or GCP guides for scalable cloud deployments.

This guide walks through a simple deployment using Docker Compose. For scalable cloud-native deployments, see our and guides.

docker-compose.yaml

Refer to for configuring ai-gateway.

Next Steps

  • For cloud-scale deployment, see and.

  • For full observability and ClickHouse tracing, refer to the .

ai-gateway-enteprise serve -c ai-gateway.yaml 
docker run -it 
    -p 8080:8080 \
    <private-url>/ai-gateway-enteprise serve \
    -c ai-gateway.yaml 
# Chat completion with GPT-4
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

# Or try Claude
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-opus",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Ping the server using the tool and return the response"}],
    "mcp_servers": [{"server_url": "http://localhost:3004"}]
  }'
https://docs.langdb.ai/
version: '3.8'

services:
  ai-gateway:
    # We will share be a private image that contains the ai-gateway-enterprise edition.
    # For reference, checkout our free image available in our Github repo.
    # https://github.com/langdb/ai-gateway
    
    image: <private-url>/ai-gateway-enterprise:latest
    ports:
      - "8083:8083"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      # ai-gateway.yaml is expected in the configuration folder.
      - config:/usr/langdb/
    container_name: "langdb-ai-gateway"
    
  clickhouse:
    image: clickhouse/clickhouse-server:latest
    ports:
      - "8123:8123"
      - "9000:9000"
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    extra_hosts:
      - "host.docker.internal:host-gateway"
    container_name: "langdb-clickhouse"
    
  postgres:
    image: postgres:latest
    container_name: langdb-cloud-enterprise-pg
    environment:
      POSTGRES_USER: langdb
     # Note: Include your postgres password as specified in ai-gateway.yaml
      POSTGRES_PASSWORD: XXXXX
      POSTGRES_DB: langdb_staging
      ALLOW_IP_RANGE: 0.0.0.0/0
    ports:
      - "5438:5432"
    command: postgres -c 'max_connections=1000'
    volumes:
      - postgres_data:/var/lib/postgresql/data
      
  redis:
    image: redis:latest
    restart: always
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/root/redis
    environment:
      # Note: Include your redis password as specified in ai-gateway.yaml
      - REDIS_PASSWORD=XXXXX
      - REDIS_PORT=6379
      - REDIS_DATABASES=16
      
volumes:
  config:
  postgres_data:
  redis_data:
  
AWS
GCP
ai-gateway.yaml
AWS guide
GCP guide
Clickhouse Queries
{
  "routes": [
    {
      "name": "premium_support_fast_track",
      "conditions": {
        "all": [
          { "metadata.user.tier": { "eq": "premium" } },
          { "metadata.request.topic": { "eq": "support" } }
        ]
      },
      "targets": {
        "$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
        "sort": { "ttft": "MIN" }
      }
    },
    {
      "name": "standard_user_cost_optimized",
      "conditions": {
        "metadata.user.tier": { "eq": "standard" }
      },
      "targets": {
        "$any": ["mistral/mistral-large-latest", "anthropic/claude-4-sonnet"],
        "sort": { "price": "MIN" }
      }
    },
    {
      "name": "internal_dev_testing",
      "conditions": {
        "metadata.user.group": { "eq": "development" }
      },
      "targets": [
        { "model": "google/gemini-2.5-pro" }
      ]
    }
  ]
}
Enterprise routing config workflow
Adding Users to same tenant on LangDB

Interceptors & Guardrails

Interceptors and Guardrails

Interceptors are custom logic that can run before or after a request is routed, allowing you to enrich, validate, or transform requests and responses. Guardrails are a common type of interceptor used to enforce policies.

Type
Purpose
Business Value
Example Use Case/Config

Pre-request

Analyze or enrich request

Classify topic, check for risk, personalize

semantic_guardrail, toxicity_guardrail

Post-request

Analyze or modify response

Moderate output, add fallback, redact sensitive info

fallback_response

When an interceptor runs, it can inject its results into the routing context, making them available for your conditional logic.

Result Variable
Description
Example Value
Business Use

semantic_guardrail.result.topic

Detected topic from a guardrail

"billing"

Route to topic-specialized models

toxicity_guardrail.result

Toxicity score from a guardrail

0.8

Block or reroute harmful content

rate_limiter.result

Result of a rate limit check

true

Enforce usage quotas and prevent abuse

Note on Guardrails: Guardrails like semantic_guardrail and toxicity_guardrail are powerful examples of custom guardrails. Checkout section for more details.

Configuring Data Retention

Control trace data retention in LangDB with scalable, cost-effective strategies using ClickHouse background TTL processes and tiered materialized views.

Overview

This document outlines LangDB's data retention strategy for tracing information stored in ClickHouse. The strategy employs materialized views to manage data retention periods based on user subscription tiers efficiently. Data eviction is implemented using ClickHouse's TTL (Time-To-Live) mechanisms and background processes:

  • TTL Definitions: Each table includes TTL expressions that specify when data should expire based on timestamp fields

  • Background Merge Process: ClickHouse automatically runs background processes that merge data parts and remove expired data during these merge operations

  • Resource-Efficient: The eviction process runs asynchronously during system low-load periods, minimizing impact on query performance

Tracing Data Architecture

LangDB uses a robust system for storing and analyzing trace data:

  • Primary Storage: All trace data is initially stored in the langdb.traces table in ClickHouse

  • Materialized Views: Tier-specific materialized views filter and retain data based on user subscription levels

  • Retention Policies: Automated TTL (Time-To-Live) mechanisms enforce retention periods

Implementation using Materialized Views

Tier-Specific Materialized Views

Professional Tier View

Enterprise Tier View

Data Access Flow

  1. New trace data is inserted into the base langdb.traces table

  2. Materialized views automatically filter and copy relevant data to tier-specific tables

  3. TTL mechanisms automatically remove data older than the specified retention period

  4. Data access APIs query the appropriate table based on the user's subscription tier

Benefits of This Approach

  • Efficiency: Only store data for the period necessary based on customer tier

  • Performance: Queries run against smaller, tier-specific tables rather than the entire dataset

  • Compliance: Clear retention boundaries help with regulatory compliance

  • Cost-Effective: Optimizes storage costs by aligning retention with customer value

Backup and Disaster Recovery

While the retention strategy focuses on operational access to trace data, a separate backup strategy ensures data can be recovered in case of system failures:

  • Daily snapshots of ClickHouse data

  • Backup retention aligned with the longest tier retention period (365 days)

  • Geo-redundant storage of backups

Monitoring and Management

The retention system includes:

  • Monitoring dashboards for data volume by tier

  • Alerts for unexpected growth or retention failures

  • Regular audits to ensure compliance with retention policies

Future Enhancements

  • Implementation of custom retention periods for specific enterprise customers

  • Cold storage options for extended archival needs

  • Advanced sampling techniques to retain representative trace data beyond standard periods

Example: Routing with Interceptors and Compliance

This example showcases a sophisticated routing configuration that uses pre-request interceptors to enforce usage quotas and guardrails, while handling region-specific compliance and prioritizing performance for premium users.

Goals:

  1. Enforce a daily rate limit on all users to prevent abuse.

  2. Check all requests for policy violations using a semantic guardrail.

  3. Provide high-performance models for premium users in the EU, but only if they are reliable.

  4. Ensure GDPR compliance by using a specialized model for requests with that requirement.

  5. Provide a clear error message to users who have exceeded their quota.

Routing Configuration (router.json):

Configuration Breakdown:

  • Interceptors: Before any routing rules are evaluated, two interceptors run:

    • rate_limiter: Checks if the user has exceeded 1,000 requests today. Its result (true or false) is added to the request context.

    • semantic_guardrail: Scans the request for any content that violates policies. (This example assumes it passes). Note that semantic_guardrail is a custom guardrail you would need to create.

  • Rule 1: premium_eu_high_performance

    • Conditions: This rule requires four conditions to be met: the user is "premium", is in the "EU", has not been rate-limited, and the target model provider has a low error_rate.

    • Targets: If all conditions pass, it routes to the fastest available high-performance model.

  • Rule 2: gdpr_compliance_fallback

    • Conditions: This rule acts as a high-priority catch-all for any request that requires GDPR compliance, regardless of user tier.

    • Targets: It forces the request to a specific, compliant model, ensuring regulatory needs are met.

  • Rule 3: rate_limit_exceeded_block

    • Conditions: This rule checks for the false result from the rate_limiter.

    • Targets: Instead of routing to a model, it uses message_mapper to block the request and return a custom error message directly to the user.

CREATE MATERIALIZED VIEW langdb.traces_professional_mv
TO langdb.traces_professional
AS SELECT *
FROM langdb.traces;

CREATE TABLE langdb.traces_professional (
    /* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(30);
CREATE MATERIALIZED VIEW langdb.traces_enterprise_mv
TO langdb.traces_enterprise
AS SELECT *
FROM langdb.traces;

CREATE TABLE langdb.traces_enterprise (
    /* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(90);
{
  "pre_request": [
    { "name": "rate_limiter", "type": "interceptor", "limit": 1000, "period": "day", "target": "user_id" },
    { "name": "semantic_guardrail", "type": "guardrail" }
  ],
  "routes": [
    {
      "name": "premium_eu_high_performance",
      "conditions": {
        "all": [
          { "metadata.user.tier": { "eq": "premium" } },
          { "metadata.region": { "eq": "EU" } },
          { "rate_limiter.result": { "eq": true } },
          { "provider.health.error_rate": { "lt": 0.02 } }
        ]
      },
      "targets": {
        "$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
        "sort": { "ttft": "MIN" }
      }
    },
    {
      "name": "gdpr_compliance_fallback",
      "conditions": {
        "metadata.compliance_tags": { "$in": ["GDPR"] }
      },
      "targets": [ { "model": "eu-specialist/gdpr-compliant-model" } ]
    },
    {
      "name": "rate_limit_exceeded_block",
      "conditions": {
        "rate_limiter.result": { "eq": false }
      },
      "message_mapper": {
        "modifier": "block",
        "content": "You have exceeded your daily quota. Please try again tomorrow."
      }
    }
  ]
}
Routing with Interceptors and Compliance

Variables & Functions

Available Metrics

LangDB provides a rich set of real-time metrics for making dynamic, data-driven routing decisions. These metrics are aggregated at the provider level, giving you a live view of model performance.

Metric Name
Description
Example Value
Business Value

requests

Total number of requests processed

1500

Monitor traffic and usage patterns

input_tokens

Number of tokens in the request prompt

500

Analyze prompt complexity and cost

output_tokens

Number of tokens in the model response

1200

Track response length and cost

total_tokens

Sum of input and output tokens

1700

Get a complete picture of token usage

latency

Average end-to-end response time (ms)

1100

Route based on overall performance

ttft

Time to First Token (ms)

450

Optimize for user-perceived speed

llm_usage

Aggregated cost of LLM usage

2.54

Track and control spend in real-time

tps

Tokens Per Second (output_tokens/latency)

300

Measure model generation speed

error_rate

Fraction of failed requests

0.01

Route around unreliable models

Available Variables

Variables provide contextual information from the incoming request and user metadata. Unlike metrics, they are not performance indicators but are essential for conditional logic.

Request Information

Variable
Description
Example Value
Business Use

ip

IP address of the requester

"203.0.113.42"

Geo-fencing, fraud detection

region

Geographical region of request

"EU"

Data residency, compliance

user_agent

Client application

"Google ADK/ CrewAI"

Agentic Library used

User Information

Variable
Description
Example Value
Business Use

user_id

Unique user identifier

"u-12345"

Auditing, per-user quotas

user_tier

User segment (e.g., premium, free)

"premium"

Personalization, rate limiting, SLAs

group

User group or segment

"beta_testers"

Feature rollout, A/B testing

region

Geographical region of request

"EU"

Data residency, compliance

Provider Metadata

Variable
Description
Example Value
Business Use

model_family

Model family or provider

"openai/gpt-4"

Brand preference, compliance

capabilities

Supported features (vision, code)

["vision", "code"]

Feature-based routing

compliance_tags

Regulatory/compliance attributes

["GDPR", "HIPAA"]

Regulatory routing

Optimisation Functions

Function
What It Does
When to Use
Example JSON Usage

sort: { price: MIN }

Picks the cheapest model

Cost control, bulk/low-priority tasks

"sort": { "price": "MIN" }

sort: { ttft_ms: MIN }

Picks the fastest model (latency)

VIP, real-time, or user-facing tasks

"sort": { "ttft": "MIN" }

sort: { error_rate: MIN }

Picks the most reliable model

Mission-critical or regulated workflows

"sort": { "error_rate": "MIN" }

sort: { token: MIN }

(Token-based) Picks model with lowest token cost

Optimize for token spend (current)

"sort": { "token": "MIN" }

Note: Currently, token-based optimization is available and recommended for controlling LLM costs.

Deploying on AWS Cloud

Deploy LangDB on AWS with high availability, full observability, VPC isolation, and managed services for scalable, secure AI operations.

AWS Deployment

This section describes how AI Gateway and its supporting components on AWS, ensuring enterprise-grade scalability, observability, and security.

Software Components

Component
Purpose / Description
AWS Service
Scaling

Architecture Overview

The LangDB service is deployed in AWS with a robust, scalable architecture designed for high availability, security, and performance. The system is built using AWS managed services to minimize operational overhead while maintaining full control over the application.

Components

Networking and Entry Points

  • AWS Region: All resources are deployed within a single AWS region for low-latency communication

  • VPC: A dedicated Virtual Private Cloud isolates the application resources

  • ALB: Application Load Balancer serves as the entry point, routing requests from https://api.{region}.langdb.ai to the appropriate services

Core Services

  • ECS Cluster: Container orchestration for the LangDB service

    • Multiple LangDB service instances distributed across availability zones for redundancy

    • Auto-scaling capabilities based on load

    • Containerized deployment for consistency across environments

Data Storage

  • RDS: Managed relational database service for persistent storage. Dedicated storage for metadata

  • ElastiCache (Redis) Cluster: In-memory caching layer

    • Used for cache and cost control

    • Multiple nodes for high availability

  • Clickhouse Cloud: Analytics database for high-performance data processing

    • Deployed in the same AWS region but outside the VPC

    • Managed service for analytical queries and data warehousing

Authentication & Security

  • Cognito: User authentication and identity management

  • Lambda: Serverless functions for authentication workflows

  • SES: Simple Email Service for email communications related to authentication

Secrets Management

  • Secrets Vault: AWS Secrets Manager for secure storage of

    • Provider keys

    • Other sensitive credentials

Data Flow

  1. Client requests hit the ALB via https://api.{region}.langdb.ai

  2. ALB routes requests to available LangDB service instances in the ECS cluster

  3. LangDB services interact with:

    • RDS for persistent data

    • ElastiCache for caching, cost control and rate limiting

    • Metadata Storage for metadata operations

    • Clickhouse Cloud for analytics and data warehousing

  4. Authentication is handled through Cognito, with Lambda functions for custom authentication flows

  5. Sensitive information is securely retrieved from Secrets Manager as needed

Security Considerations

  • All components except Clickhouse Cloud are contained within a VPC for network isolation

  • Secure connections to Clickhouse Cloud are established from within the VPC

  • Authentication is managed through AWS Cognito

  • Secrets are stored in AWS Secrets Manager

  • All communication between services uses encryption in transit

Operational Benefits

  • Scalability: The architecture supports horizontal scaling of LangDB service instances

  • High Availability: Multiple instances across availability zones

  • Managed Services: Leveraging AWS managed services reduces operational overhead

Deployment Process

LangDB infrastructure is deployed and managed using Terraform, providing infrastructure-as-code capabilities with the following benefits:

Terraform Architecture

  • Modular Structure: The deployment code is organized into reusable Terraform modules that encapsulate specific infrastructure components (networking, compute, storage, etc.)

  • Environment-Specific Variables: Using .tfvars files to manage environment-specific configurations (dev, staging, prod)

  • State Management: Terraform state is stored remotely to enable collaboration and version control

Deployment Workflow

  1. Configuration Management: Environment-specific variables are defined in .tfvars files

  2. Resource Provisioning: Terraform creates and configures all AWS resources, including:

    • VPC and networking components

    • Fargate instances and container configurations

    • Postgres databases and Redis clusters

    • Authentication services and Lambda functions

    • Secrets Manager entries and access controls

  3. Dependency Management: Terraform handles resource dependencies, ensuring proper creation order

Maintenance & Updates

Ongoing infrastructure maintenance is managed through Terraform:

  • Scaling resources up/down based on demand

  • Applying security patches and updates

  • Modifying configurations for performance optimization

  • Adding new resources or services as needed

LLM Gateway

Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing.

Amazon ECS (Elastic Container Service)

ECS Auto Scaling based on CPU/memory or custom CloudWatch metrics.

Metadata Store (PostgreSQL)

Stores metadata related to API usage, configurations, and more.

Amazon RDS (PostgreSQL)

Vertical scaling (instance size), Multi-AZ support. Read replicas can be configured for better read performance.

Cache Store (Redis)

Implements rolling cost control and rate limiting for API usage.

Amazon ElastiCache (Redis)

Scale by adding shards/replicas, Multi-AZ support.

Observability & Analytics Store (ClickHouse)

Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry.

ClickHouse Cloud (external)

Scales independently; ensure sufficient network throughput for trace/log ingestion.

Load Balancing

Distributes incoming traffic to ECS tasks, enabling high availability and SSL termination.

Amazon ALB (Application Load Balancer)

Scales automatically to handle incoming traffic; supports multi-AZ deployments.

Architecture of LangDB on AWS/GCP Deployment

Working with Models

Manage public and private LLM models in LangDB with flexible APIs, custom parameter schemas, and dynamic request/response mapping.

Managing and Adding New Models

The platform provides a flexible management API that allows you to publish and manage machine learning models at both the tenant and project levels. This enables organizations to control access and visibility of models, supporting both public and private use cases.

Model Listing on LangDB

Model Types

  • Public Models:

    • Can be added without specifying a project ID.

    • Accessible to all users on this deployment.

    • In enterprise deployments, public models are added monthly.

    • Specific requests for models can be made talking to our support.

  • Private Models:

    • Require a project_id and a provider_id for a known provider with a pre-configured secret.

    • Access is restricted to the specified project and provider.

Model Parameters

When publishing a model, you can specify:

  • Request/Response Mapping:

    • By default, models are expected to be OpenAI-compatible.

    • You can also specify custom request/response processors using dynamic scripts (see 'Coming Soon' below).

  • Model Parameters Schema:

    • A JSON schema describing the parameters that can be sent with requests to the model.

Management API to Publish New Models

The management API allows you to register new models with the platform. Below is an example of how to use the API to publish a new model.

Sample cURL Request

curl -X POST https://api.xxx.langdb.ai/admin/models \
  -H "Authorization: Bearer <your_token>" \
  -H "X-Admin-Key: <admin_key>"\
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-model",
    "description" "Description of the LLM Model",
    "provider_id": "123e4567-e89b-12d3-a456-426614174000",
    "project_id": "<project_id>",
    "public": false,
    "request_response_mapping": "openai-compatible", // or custom script
    "model_type": "completions",
    "input_token_price": "0.00001",
    "output_token_price": "0.00003",
    "context_size": 128000,
    "capabilities": ["tools"],
    "input_types": ["text", "image"],
    "output_types": ["text", "image"],
    "tags": [],
    "owner_name": "openai",
    "priority": 0,
    "model_name_in_provider": "my-model-v1.2",
    "parameters": {
       "top_k":{
          "min":0,
          "step":1,
          "type":"int",
          "default":0,
          "required":false,
          "description":"Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens."
       },
       "top_p":{
          "max":1,
          "min":0,
          "step":0.05,
          "type":"float",
          "default":1,
          "required":false,
          "description":"An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both."
       },
    }
  }'

API properties

Field
Type
Description

model_name

String

The display name of the model

description

String

A detailed description of the model's capabilities and use cases

provider_info_id

UUID

The UUID of the provider that offers this model

project_id

UUID

Which project this model belongs to

public

Boolean

Whether the model is publicly discoverable or private

request_response_mapping

String

"openai-compatible" or a custom mapping script

model_type

String

The type of model (e.g., "completions", "image", "embedding")

owner_name

String

The name of the model's owner or creator

priority

i32

Priority level for the model in listings (higher numbers indicate higher priority)

input_token_price

Nullable float

Price per input token

output_token_price

Nullable float

Price per output token

context_size

Nullable u32

Maximum context window size in tokens

capabilities

String[]

List of model capabilities (e.g., "tools")

input_types

String[]

Supported input formats (e.g., "text", "image", "audio")

output_types

String[]

Supported output formats (e.g., "text", "image", "audio")

tags

String[]

Classification tags for the model

type_prices

Map<String, float>

JSON string containing prices for different usage types (used for image generation model pricing)

mp_price

Nullable float

Price by megapixel (used for image generation model pricing)

model_name_in_provider

String

The model's identifier in the provider's system

parameters

Map<String, Map<String, any>>

Additional configuration parameters as JSON

Checkout the full API Specification:

Usage of API

  • Replace <platform-url>, <your_token>, and <project_id> with your actual values.

  • Set public to true for public models (omit project_id and provider_id), or false for private models.

  • The parameters_schema field allows you to define the expected parameters for your model.

Setting Parameters for a sample request on LangDB

Dynamic Request/Response Mapping (Coming Soon)

The platform will soon support dynamic request/response mapping using custom scripts. This feature will allow you to define how requests are transformed before being sent to the model, and how responses are processed before being returned to the client. This will enable support for a wide variety of model APIs and custom workflows.

Stay tuned!

LangDB Models listing page
Setting Parameters for a sample request on LangDB
Guardrails
API Reference

Routing Engine (Enterprise Only)

Use JSON rules or an embedded script to manage AI models, cut costs, and boost performance. A guide for enterprise-level routing.

LangDB Routing enables organisations to control how user requests are handled by AI models, optimizing for cost, performance, compliance, and user experience. By defining routing rules in JSON or through an embedded script, businesses can automate decision-making, ensure reliability, and maximize value from their AI investments.

What is Routing in LangDB?

Routing in LangDB is the process of directing each user request to the most appropriate AI model or service, based on business logic, user profile, request content, or real-time system metrics. This enables:

  • Cost savings by using cheaper models for non-critical tasks

  • Performance optimization by routing to the fastest or most reliable models

  • Compliance by enforcing data residency or content policies

  • Personalization by serving different user segments with tailored models

Example Use Cases

Enterprise Use Case
Business Goal
Key Variables & Metrics
Routing Logic Summary

SLA-Driven Tiering

Guarantee premium performance for high-value customers.

user_tier, ttft_ms

Route user_tier: "premium" to models with the lowest ttft_ms.

Geographic Compliance

Ensure data sovereignty and meet regulatory requirements (e.g., GDPR).

region, compliance_tags

If region: "EU", route to models with compliance_tags: "GDPR".

Intelligent Cost Management

Reduce operational expenses for internal or low-priority tasks.

user_group, price

If user_group: "internal", sort available models by price: MIN.

Model A/B Testing

Evaluate new AI models on a subset of live traffic before full rollout.

user_id (hashed), percentage split

Route 10% of traffic to a new model and 90% to the current default.

Content-Aware Routing

Improve accuracy by using specialized models for specific topics.

semantic_guardrail.result.topic

If topic: "finance", route to a finance-tuned model.

Brand Safety Enforcement

Prevent brand damage by blocking or redirecting inappropriate content.

toxicity_guardrail.result

If toxicity_score > 0.8, block the request or route to a safe-reply model.

Checkout the full examples below:

Routing Rule Anatomy

JSON Field
Meaning
Example Value/Usage
Business Impact

routes

List of routing rules

[ ... ]

Controls all routing logic

name

Name of the rule

"vip_fast_lane"

For audit, clarity, and reporting

conditions

When to apply this rule

{ "user_tier": { "eq": "VIP" } }

Target specific users, topics, etc.

targets

Which model(s) to use if rule matches

{ "$any": [ ... ] }

Pool of models for flexibility

$any

Pool of models to choose from

["openai/gpt-4.5", ... ]

Enables failover and optimization

sort

How to pick the best model

{ "price": "MIN" }

Optimize for cost, speed, or reliability

pre_request

Checks/enrichments before routing

[ ... ]

Add business logic, compliance, etc.

post_response

Actions after model response

[ ... ]

Moderate, fallback, or redact responses

message_mapper

Modify request/response

{ ... }

Customizes user experience

Additional Features

Script-Based Routing for Advanced Users

For advanced users, LangDB supports script-based routing using inline WASM (WebAssembly) scripts. These scripts can access all the same variables and metrics as JSON rules, enabling highly flexible, programmable routing logic for complex enterprise needs.

Router Metrics & Observability

All router metrics (including routing decisions, latencies, error rates, and more) are available via OpenTelemetry and can be exported to your observability or monitoring stack for real-time analytics and alerting.

Best Practices

  • Start simple: Begin with clear, high-value rules (e.g., VIP fast lane, cost control)

  • Use metrics: Leverage available metrics to optimize for business goals

  • Test and iterate: Monitor outcomes and refine rules for better results

  • Document rules: Use the name field and comments for clarity

  • Plan for compliance: Use region and content checks for regulatory needs

  • Monitor costs: Use token-based optimization to control spend

Example: Building an Enterprise Routing Configuration
Example: Routing with Interceptors and Compliance
Routing Lifecycle