Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Learn more about LangDB's Enterprise Options
LangDB offers two enterprise licensing models to fit your organization's needs:
Best for: Development teams and startups running AI workloads in private VPCs who want a fully managed experience.
Deployment: Entire infrastructure is deployed on GCP or AWS and fully managed by LangDB.
For AWS, an AWS account with shared access will be provisioned.
For GCP, a new project will be provisioned.
Infrastructure: Fully Managed.
Best for: Enterprises running large-scale AI operations who want maximum flexibility and control.
Deployment: LangDB provides a highly performant binary that can be deployed in your own infrastructure (on-prem, cloud, or hybrid).
x86_64, aarch64 for Ubuntu
Infrastructure: Bring your own
Solutioning Add-On: Available at an hourly rate if needed.
For details or clarification, please book a meeting on our website.
Explore LangDB’s dedicated tenant architecture with secure metadata storage, real-time observability, cost management, and scalable MCP execution.
This page describes the core architecture of the LangDB AI Gateway, a unified platform for interfacing with a wide variety of Large Language Models (LLMs) and building agentic applications with enterprise-grade observability, cost control, and scalability, MCP features and more.
AI Gateway
Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing. Free & Open Source version available at .
Multi-tenancy, advanced cost control, and rate limiting. Contact LangDB for access.
Metadata Store (PostgreSQL)
Stores metadata related to API usage, configurations, and more.
For scalable/multi-tenant deployments, use managed PostgreSQL (e.g., AWS RDS, GCP Cloud SQL).
Cache Store
(Redis)
Implements rolling cost control and rate limiting for API usage.
Enterprise version supports Redis integration for cost control and rate limiting.
Observability & Analytics Store (ClickHouse)
Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry.
For large-scale deployments, use ClickHouse Cloud. Traces stored in langdb.traces
table.
Note:
Metadata Store: Powered by PostgreSQL (consider AWS RDS, GCP Cloud SQL for enterprise)
Cache Store: Powered by Redis (enterprise only)
Observability & Analytics Store: Powered by ClickHouse (consider ClickHouse Cloud for scale)
LangDB provisions a dedicated environment for each tenant. This environment is isolated per tenant and is set up in a separate AWS account or GCP project, managed by LangDB. Customers connect securely to their provisioned environment from their own VPCs, ensuring strong network isolation and security.
LangDB itself operates a thin, shared public cloud environment (the "control plane") that is primarily responsible for:
Provisioning new tenant environments
Managing access control and user/tenant provisioning
Handling external federated account connections (e.g., SSO)
Hosting the LangDB Dashboard frontend application for configuration, monitoring, and management
All operational workloads, data storage, and LLM/MCP execution occur within the tenant-specific environment. The shared LangDB cloud is not involved in data processing or LLM execution, but only in provisioning, access management, and dashboard hosting.
Integrates with customer identity providers (Active Directory, SAML, SSO).
Users (AI Apps, Agents, Administrators, Developers) interact with LangDB via secure endpoints.
Centralized dashboard for configuration, monitoring, and management.
Handles user and tenant provisioning, access control, and external federated account connections.
All provisioning and access is centrally managed via LangDB Cloud and Dashboard.
Each tenant (enterprise deployment) is provisioned in a dedicated AWS account or GCP project.
Communication between tenant environment and LangDB is secured and managed.
Provisioning is automated via Terraform.
Stores all configuration and metadata required for operation, including:
Virtual models
Virtual MCP servers
Projects
Guardrails
Routers
Used for fast, in-memory operations related to:
Rate limiting & cost control
LLM usage tracking
MCP usage tracking
Stores analytics and observability data:
Traces (API calls, LLM invocations, etc.)
Metrics (performance, usage, etc.)
User and tenant provisioning is centrally controlled via LangDB Cloud and Dashboard.
External federated accounts (e.g., enterprise SSO) can be connected to LangDB Cloud for seamless access management.
Data retention policies mainly apply to observability data (traces, metrics) stored in ClickHouse.
Retention is enforced per subscription tier; traces are automatically cleared after the retention period expires.
MCP servers are deployed in a serverless fashion using AWS Lambda or GCP Cloudrun for scalability and cost efficiency.
Deploy LangDB AI Gateway on GCP, connecting to external Postgres, Redis, and ClickHouse for scalable, cloud-native operations.
Use YAML config to securely define storage, networking, and service settings for your LangDB AI Gateway deployments.
ai-gateway.yaml is the primary way to configure secrets and specific features of AI Gateway.
# Refer to the ai-gateway.yaml for
# advanced configurations available
# ....
# Configuration for storage
database_config:
url: "postgres://langdb:XXXX@localhost:5438/postgres"
redis_config:
url: "redis://localhost:6379"
langdb_clickhouse_config:
url: localhost:8123
rest_api:
port: 8083
host: 0.0.0.0
cors: true
## Configure storage location
# storage_config: !Local "file:///<storage-path>"
langdb_cloud_ui:
url: http://localhost:3000
Deploy LangDB AI Gateway on Kubernetes using Helm, connecting to external Postgres, Redis, and ClickHouse for scalable, cloud-native operations.
This guide walks you through deploying the LangDB AI Gateway enterp Refer to individual database links for deploying and scaling clusters for postgres, redis and clickhouse.
Checkout the repository
This will deploy:
ai-gateway (using the default image)
uses External Postgres, Redis, and ClickHouse
values.yaml
This will automatically mount your config.yaml into the container at
/app/config.yaml
and the ai-gateway will use it on startup.
Run the following command to install the AI Gateway:
This deploys:
ai-gateway using the default image.
Connections to external Postgres, Redis, and ClickHouse instances.
By default, the service is exposed as a ClusterIP
. To access it externally, you can port-forward:
Then access the gateway at http://localhost:8080
.
To remove the deployment:
Check out the full source repository here:
git clone [email protected]:langdb/helm-chart.git
cd helm-chart/helm/ai-gateway
env:
CLICKHOUSE_HOST: <external-clickhouse-host>
REDIS_HOST: <external-redis-host>
POSTGRES_HOST: <external-postgres-host>
POSTGRES_USER: <your-user>
POSTGRES_PASSWORD: <your-password>
POSTGRES_DB: <your-db>
config: |
# ai-gateway configuration
http:
host: "0.0.0.0"
port: 8080
helm install ai-gateway .
kubectl port-forward svc/ai-gateway 8080:80
helm uninstall ai-gateway
Track and query LangDB traces with ClickHouse, using bloom filter indexes for fast thread and run-specific analytics.
Clickhouse is used for observability in LangDB. It provides high-performance analytics capabilities that allow us to track and analyze system behavior, performance metrics, and user activities across the platform.
The following create query represents thetraces
table that stores distributed tracing information of Langdb AI Gateway.
CREATE TABLE IF NOT EXISTS langdb.traces
(
trace_id UUID,
span_id UInt64,
parent_span_id UInt64,
operation_name LowCardinality(String),
kind String,
start_time_us UInt64,
finish_time_us UInt64,
finish_date Date,
attribute Map(String, String),
tenant_id Nullable(String),
project_id String,
thread_id String,
tags Map(String, String),
parent_trace_id Nullable(UUID),
run_id Nullable(UUID)
)
ENGINE = MergeTree
ORDER BY (finish_date, finish_time_us, trace_id)
SETTINGS index_granularity = 8192;
-- Add bloom filter index for thread_id
ALTER TABLE langdb.traces ADD INDEX idx_thread_id thread_id TYPE bloom_filter GRANULARITY 4;
-- Add composite index for tenant_id, project_id, and operation_name
ALTER TABLE langdb.traces ADD INDEX idx_tenant_projec
thread_id
field with its dedicated bloom filter index allows for efficient filtering of traces based on specific execution threads.
The run_id
field enables filtering and grouping traces by specific execution runs.
Implement tenant-level isolation with LangDB’s robust multitenancy architecture, featuring row policies in ClickHouse and secure metadata controls in Postgres.
This document outlines the multitenancy implementation in LangDB, explaining how data isolation is maintained across different tenants.
LangDB implements a robust multitenancy model that ensures complete isolation of tenant data while maintaining efficient resource utilization. This approach is implemented across different data storage systems used in the platform.
Clickhouse is used for analytics and observability in LangDB. The multitenancy implementation in Clickhouse includes:
Each tenant in LangDB has a dedicated Clickhouse user and role
These custom roles enforce access permissions specific to the tenant's data
Authentication and authorization are managed at the tenant level
Prevents cross-tenant data access even at the database level
All read operations in Clickhouse are governed by row policies
Row policies filter data based on the tenant_name
column
When a tenant's credentials are used for database access, the row policy automatically restricts results to only that tenant's data
This provides a zero-trust isolation model where the application doesn't need to include tenant filters
All inserts into Clickhouse tables automatically populate the tenant column
The tenant column is populated based on the authenticated user context
Direct inserts by tenants are not allowed, preventing potential data integrity issues
Insert operations are performed via service accounts with appropriate tenant context
Postgres is used as the primary metadata storage in LangDB. The multitenancy implementation in Postgres includes:
Tenant isolation is implemented at the application logic level
All database queries include tenant-specific filters
Application code ensures that queries only return records belonging to the authenticated tenant
Modifications are restricted to only the tenant's own data through application context
Tenant identifier is a required column in all tenant-specific tables
All database operations include tenant context validation
Application middleware enforces tenant context for every database operation
This multitenancy model is consistently implemented across Langdb's AWS and GCP deployments, ensuring that tenant data remains securely isolated regardless of the cloud provider.
Set up tenants and users in LangDB via direct signup or federated SSO, with dedicated infrastructure for enterprise deployments.
Tenant Provision happens through LangDB dashboard where you can register for a company, upgrade to Enterprise License.
Reach out to our support staff for configuring your tenant environment using AWS or GCP.
For self hosted enterprise versions, you ll be requested for discovery URLs to be registered with LangDB Control Environment.
The main LangDB Cloud is multi-tenant, with shared infrastructure for all tenants.
Enterprise deployments are provisioned per tenant, with dedicated infrastructure and network isolation.
Provisioning an individual tenant involves setting up an entire AWS account or GCP project per tenant, managed via Terraform, which then communicates securely with LangDB Cloud.
Two types of user modes are supported.
Users can sign up directly and invite additional users to their tenant.
Easiest to setup.
You can restrict signups to specific email domains for security.
Reach out to us for linking your federated account to your tenant.
Users that are part of the directory can register on the sub-domain.
Currently roles are managed through LangDB dashboard.
Dynamic Role Mapping feature is in active development.
This example demonstrates a multi-layered routing strategy for a SaaS company that balances performance for premium users, cost for standard users, and flexibility for internal development.
Goals:
Provide the fastest possible responses for "premium" customers on support-related queries.
Minimize costs for "standard" tier users.
Allow the internal "development" team to test a new, experimental model without affecting customers.
Routing Configuration (router.json
):
Rule 1: premium_support_fast_track
Conditions: This rule applies only when a request comes from a user in the "premium"
tier AND the request topic has been identified as "support"
. This uses an all
operator to combine conditions.
Targets: It routes the request to a pool of high-performance models (anthropic/claude-4-opus
, openai/gpt-o3
) and selects the one with the lowest time-to-first-token (ttft
), ensuring the fastest response.
Rule 2: standard_user_cost_optimized
Conditions: This is a broader rule that catches all requests from "standard"
tier users.
Targets: It uses a pool of cost-effective models (mistral/mistral-large-latest
, anthropic/claude-4-sonnet
) and selects the one with the minimum price
, optimizing for spend.
Rule 3: internal_dev_testing
Conditions: This rule applies to any user in the "development"
group.
Targets: It directs their requests to google/gemini-2.5-pro
, isolating test traffic from the production user base.
Self-host LangDB AI Gateway Enterprise locally with ClickHouse, PostgreSQL, and Redis for full control over tracing, caching, and analytics.
ClickHouse (for request tracing & analytics)
PostgreSQL (for metadata and user management)
Redis (for caching and rate‑limiting)
You can self host our enterprise version using two options.
Supported Platforms:
x86_64
aarch64
Test the gateway with a simple chat completion:
Invoke an MCP server alongside your request:
Refer to for understanding various features.
Launch LangDB locally via Docker Compose for fast development, and explore AWS or GCP guides for scalable cloud deployments.
ai-gateway-enteprise serve -c ai-gateway.yaml
docker run -it
-p 8080:8080 \
<private-url>/ai-gateway-enteprise serve \
-c ai-gateway.yaml
# Chat completion with GPT-4
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'
# Or try Claude
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-opus",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Ping the server using the tool and return the response"}],
"mcp_servers": [{"server_url": "http://localhost:3004"}]
}'
version: '3.8'
services:
ai-gateway:
# We will share be a private image that contains the ai-gateway-enterprise edition.
# For reference, checkout our free image available in our Github repo.
# https://github.com/langdb/ai-gateway
image: <private-url>/ai-gateway-enterprise:latest
ports:
- "8083:8083"
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
# ai-gateway.yaml is expected in the configuration folder.
- config:/usr/langdb/
container_name: "langdb-ai-gateway"
clickhouse:
image: clickhouse/clickhouse-server:latest
ports:
- "8123:8123"
- "9000:9000"
ulimits:
nofile:
soft: 262144
hard: 262144
extra_hosts:
- "host.docker.internal:host-gateway"
container_name: "langdb-clickhouse"
postgres:
image: postgres:latest
container_name: langdb-cloud-enterprise-pg
environment:
POSTGRES_USER: langdb
# Note: Include your postgres password as specified in ai-gateway.yaml
POSTGRES_PASSWORD: XXXXX
POSTGRES_DB: langdb_staging
ALLOW_IP_RANGE: 0.0.0.0/0
ports:
- "5438:5432"
command: postgres -c 'max_connections=1000'
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:latest
restart: always
ports:
- "6379:6379"
volumes:
- redis_data:/root/redis
environment:
# Note: Include your redis password as specified in ai-gateway.yaml
- REDIS_PASSWORD=XXXXX
- REDIS_PORT=6379
- REDIS_DATABASES=16
volumes:
config:
postgres_data:
redis_data:
{
"routes": [
{
"name": "premium_support_fast_track",
"conditions": {
"all": [
{ "metadata.user.tier": { "eq": "premium" } },
{ "metadata.request.topic": { "eq": "support" } }
]
},
"targets": {
"$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
"sort": { "ttft": "MIN" }
}
},
{
"name": "standard_user_cost_optimized",
"conditions": {
"metadata.user.tier": { "eq": "standard" }
},
"targets": {
"$any": ["mistral/mistral-large-latest", "anthropic/claude-4-sonnet"],
"sort": { "price": "MIN" }
}
},
{
"name": "internal_dev_testing",
"conditions": {
"metadata.user.group": { "eq": "development" }
},
"targets": [
{ "model": "google/gemini-2.5-pro" }
]
}
]
}
Interceptors are custom logic that can run before or after a request is routed, allowing you to enrich, validate, or transform requests and responses. Guardrails are a common type of interceptor used to enforce policies.
Pre-request
Analyze or enrich request
Classify topic, check for risk, personalize
semantic_guardrail
, toxicity_guardrail
Post-request
Analyze or modify response
Moderate output, add fallback, redact sensitive info
fallback_response
When an interceptor runs, it can inject its results into the routing context, making them available for your conditional logic.
semantic_guardrail.result.topic
Detected topic from a guardrail
"billing"
Route to topic-specialized models
toxicity_guardrail.result
Toxicity score from a guardrail
0.8
Block or reroute harmful content
rate_limiter.result
Result of a rate limit check
true
Enforce usage quotas and prevent abuse
Note on Guardrails: Guardrails like
semantic_guardrail
andtoxicity_guardrail
are powerful examples of custom guardrails. Checkout section for more details.
Control trace data retention in LangDB with scalable, cost-effective strategies using ClickHouse background TTL processes and tiered materialized views.
This document outlines LangDB's data retention strategy for tracing information stored in ClickHouse. The strategy employs materialized views to manage data retention periods based on user subscription tiers efficiently. Data eviction is implemented using ClickHouse's TTL (Time-To-Live) mechanisms and background processes:
TTL Definitions: Each table includes TTL expressions that specify when data should expire based on timestamp fields
Background Merge Process: ClickHouse automatically runs background processes that merge data parts and remove expired data during these merge operations
Resource-Efficient: The eviction process runs asynchronously during system low-load periods, minimizing impact on query performance
LangDB uses a robust system for storing and analyzing trace data:
Primary Storage: All trace data is initially stored in the langdb.traces
table in ClickHouse
Materialized Views: Tier-specific materialized views filter and retain data based on user subscription levels
Retention Policies: Automated TTL (Time-To-Live) mechanisms enforce retention periods
Professional Tier View
Enterprise Tier View
New trace data is inserted into the base langdb.traces
table
Materialized views automatically filter and copy relevant data to tier-specific tables
TTL mechanisms automatically remove data older than the specified retention period
Data access APIs query the appropriate table based on the user's subscription tier
Efficiency: Only store data for the period necessary based on customer tier
Performance: Queries run against smaller, tier-specific tables rather than the entire dataset
Compliance: Clear retention boundaries help with regulatory compliance
Cost-Effective: Optimizes storage costs by aligning retention with customer value
While the retention strategy focuses on operational access to trace data, a separate backup strategy ensures data can be recovered in case of system failures:
Daily snapshots of ClickHouse data
Backup retention aligned with the longest tier retention period (365 days)
Geo-redundant storage of backups
The retention system includes:
Monitoring dashboards for data volume by tier
Alerts for unexpected growth or retention failures
Regular audits to ensure compliance with retention policies
Implementation of custom retention periods for specific enterprise customers
Cold storage options for extended archival needs
Advanced sampling techniques to retain representative trace data beyond standard periods
This example showcases a sophisticated routing configuration that uses pre-request interceptors to enforce usage quotas and guardrails, while handling region-specific compliance and prioritizing performance for premium users.
Goals:
Enforce a daily rate limit on all users to prevent abuse.
Check all requests for policy violations using a semantic guardrail.
Provide high-performance models for premium users in the EU, but only if they are reliable.
Ensure GDPR compliance by using a specialized model for requests with that requirement.
Provide a clear error message to users who have exceeded their quota.
Routing Configuration (router.json
):
Configuration Breakdown:
Interceptors: Before any routing rules are evaluated, two interceptors run:
rate_limiter
: Checks if the user has exceeded 1,000 requests today. Its result (true
or false
) is added to the request context.
semantic_guardrail
: Scans the request for any content that violates policies. (This example assumes it passes). Note that semantic_guardrail
is a custom guardrail you would need to create.
Rule 1: premium_eu_high_performance
Conditions: This rule requires four conditions to be met: the user is "premium"
, is in the "EU"
, has not been rate-limited, and the target model provider has a low error_rate
.
Targets: If all conditions pass, it routes to the fastest available high-performance model.
Rule 2: gdpr_compliance_fallback
Conditions: This rule acts as a high-priority catch-all for any request that requires GDPR compliance, regardless of user tier.
Targets: It forces the request to a specific, compliant model, ensuring regulatory needs are met.
Rule 3: rate_limit_exceeded_block
Conditions: This rule checks for the false
result from the rate_limiter
.
Targets: Instead of routing to a model, it uses message_mapper
to block the request and return a custom error message directly to the user.
CREATE MATERIALIZED VIEW langdb.traces_professional_mv
TO langdb.traces_professional
AS SELECT *
FROM langdb.traces;
CREATE TABLE langdb.traces_professional (
/* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(30);
CREATE MATERIALIZED VIEW langdb.traces_enterprise_mv
TO langdb.traces_enterprise
AS SELECT *
FROM langdb.traces;
CREATE TABLE langdb.traces_enterprise (
/* Same structure as base table */
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
TTL timestamp + toIntervalDay(90);
{
"pre_request": [
{ "name": "rate_limiter", "type": "interceptor", "limit": 1000, "period": "day", "target": "user_id" },
{ "name": "semantic_guardrail", "type": "guardrail" }
],
"routes": [
{
"name": "premium_eu_high_performance",
"conditions": {
"all": [
{ "metadata.user.tier": { "eq": "premium" } },
{ "metadata.region": { "eq": "EU" } },
{ "rate_limiter.result": { "eq": true } },
{ "provider.health.error_rate": { "lt": 0.02 } }
]
},
"targets": {
"$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
"sort": { "ttft": "MIN" }
}
},
{
"name": "gdpr_compliance_fallback",
"conditions": {
"metadata.compliance_tags": { "$in": ["GDPR"] }
},
"targets": [ { "model": "eu-specialist/gdpr-compliant-model" } ]
},
{
"name": "rate_limit_exceeded_block",
"conditions": {
"rate_limiter.result": { "eq": false }
},
"message_mapper": {
"modifier": "block",
"content": "You have exceeded your daily quota. Please try again tomorrow."
}
}
]
}
LangDB provides a rich set of real-time metrics for making dynamic, data-driven routing decisions. These metrics are aggregated at the provider level, giving you a live view of model performance.
requests
Total number of requests processed
1500
Monitor traffic and usage patterns
input_tokens
Number of tokens in the request prompt
500
Analyze prompt complexity and cost
output_tokens
Number of tokens in the model response
1200
Track response length and cost
total_tokens
Sum of input and output tokens
1700
Get a complete picture of token usage
latency
Average end-to-end response time (ms)
1100
Route based on overall performance
ttft
Time to First Token (ms)
450
Optimize for user-perceived speed
llm_usage
Aggregated cost of LLM usage
2.54
Track and control spend in real-time
tps
Tokens Per Second (output_tokens/latency)
300
Measure model generation speed
error_rate
Fraction of failed requests
0.01
Route around unreliable models
Variables provide contextual information from the incoming request and user metadata. Unlike metrics, they are not performance indicators but are essential for conditional logic.
ip
IP address of the requester
"203.0.113.42"
Geo-fencing, fraud detection
region
Geographical region of request
"EU"
Data residency, compliance
user_agent
Client application
"Google ADK/ CrewAI"
Agentic Library used
user_id
Unique user identifier
"u-12345"
Auditing, per-user quotas
user_tier
User segment (e.g., premium, free)
"premium"
Personalization, rate limiting, SLAs
group
User group or segment
"beta_testers"
Feature rollout, A/B testing
region
Geographical region of request
"EU"
Data residency, compliance
model_family
Model family or provider
"openai/gpt-4"
Brand preference, compliance
capabilities
Supported features (vision, code)
["vision", "code"]
Feature-based routing
compliance_tags
Regulatory/compliance attributes
["GDPR", "HIPAA"]
Regulatory routing
sort: { price: MIN }
Picks the cheapest model
Cost control, bulk/low-priority tasks
"sort": { "price": "MIN" }
sort: { ttft_ms: MIN }
Picks the fastest model (latency)
VIP, real-time, or user-facing tasks
"sort": { "ttft": "MIN" }
sort: { error_rate: MIN }
Picks the most reliable model
Mission-critical or regulated workflows
"sort": { "error_rate": "MIN" }
sort: { token: MIN }
(Token-based) Picks model with lowest token cost
Optimize for token spend (current)
"sort": { "token": "MIN" }
Note: Currently, token-based optimization is available and recommended for controlling LLM costs.
Deploy LangDB on AWS with high availability, full observability, VPC isolation, and managed services for scalable, secure AI operations.
This section describes how AI Gateway and its supporting components on AWS, ensuring enterprise-grade scalability, observability, and security.
The LangDB service is deployed in AWS with a robust, scalable architecture designed for high availability, security, and performance. The system is built using AWS managed services to minimize operational overhead while maintaining full control over the application.
AWS Region: All resources are deployed within a single AWS region for low-latency communication
VPC: A dedicated Virtual Private Cloud isolates the application resources
ALB: Application Load Balancer serves as the entry point, routing requests from https://api.{region}.langdb.ai
to the appropriate services
ECS Cluster: Container orchestration for the LangDB service
Multiple LangDB service instances distributed across availability zones for redundancy
Auto-scaling capabilities based on load
Containerized deployment for consistency across environments
RDS: Managed relational database service for persistent storage. Dedicated storage for metadata
ElastiCache (Redis) Cluster: In-memory caching layer
Used for cache and cost control
Multiple nodes for high availability
Clickhouse Cloud: Analytics database for high-performance data processing
Deployed in the same AWS region but outside the VPC
Managed service for analytical queries and data warehousing
Cognito: User authentication and identity management
Lambda: Serverless functions for authentication workflows
SES: Simple Email Service for email communications related to authentication
Secrets Vault: AWS Secrets Manager for secure storage of
Provider keys
Other sensitive credentials
Client requests hit the ALB via https://api.{region}.langdb.ai
ALB routes requests to available LangDB service instances in the ECS cluster
LangDB services interact with:
RDS for persistent data
ElastiCache for caching, cost control and rate limiting
Metadata Storage for metadata operations
Clickhouse Cloud for analytics and data warehousing
Authentication is handled through Cognito, with Lambda functions for custom authentication flows
Sensitive information is securely retrieved from Secrets Manager as needed
All components except Clickhouse Cloud are contained within a VPC for network isolation
Secure connections to Clickhouse Cloud are established from within the VPC
Authentication is managed through AWS Cognito
Secrets are stored in AWS Secrets Manager
All communication between services uses encryption in transit
Scalability: The architecture supports horizontal scaling of LangDB service instances
High Availability: Multiple instances across availability zones
Managed Services: Leveraging AWS managed services reduces operational overhead
LangDB infrastructure is deployed and managed using Terraform, providing infrastructure-as-code capabilities with the following benefits:
Modular Structure: The deployment code is organized into reusable Terraform modules that encapsulate specific infrastructure components (networking, compute, storage, etc.)
Environment-Specific Variables: Using .tfvars
files to manage environment-specific configurations (dev, staging, prod)
State Management: Terraform state is stored remotely to enable collaboration and version control
Configuration Management: Environment-specific variables are defined in .tfvars
files
Resource Provisioning: Terraform creates and configures all AWS resources, including:
VPC and networking components
Fargate instances and container configurations
Postgres databases and Redis clusters
Authentication services and Lambda functions
Secrets Manager entries and access controls
Dependency Management: Terraform handles resource dependencies, ensuring proper creation order
Ongoing infrastructure maintenance is managed through Terraform:
Scaling resources up/down based on demand
Applying security patches and updates
Modifying configurations for performance optimization
Adding new resources or services as needed
LLM Gateway
Unified interface to 300+ LLMs using the OpenAI API format. Built-in observability and tracing.
Amazon ECS (Elastic Container Service)
ECS Auto Scaling based on CPU/memory or custom CloudWatch metrics.
Metadata Store (PostgreSQL)
Stores metadata related to API usage, configurations, and more.
Amazon RDS (PostgreSQL)
Vertical scaling (instance size), Multi-AZ support. Read replicas can be configured for better read performance.
Cache Store (Redis)
Implements rolling cost control and rate limiting for API usage.
Amazon ElastiCache (Redis)
Scale by adding shards/replicas, Multi-AZ support.
Observability & Analytics Store (ClickHouse)
Provides observability by storing and analyzing traces/logs. Supports OpenTelemetry.
ClickHouse Cloud (external)
Scales independently; ensure sufficient network throughput for trace/log ingestion.
Load Balancing
Distributes incoming traffic to ECS tasks, enabling high availability and SSL termination.
Amazon ALB (Application Load Balancer)
Scales automatically to handle incoming traffic; supports multi-AZ deployments.
Manage public and private LLM models in LangDB with flexible APIs, custom parameter schemas, and dynamic request/response mapping.
The platform provides a flexible management API that allows you to publish and manage machine learning models at both the tenant and project levels. This enables organizations to control access and visibility of models, supporting both public and private use cases.
Public Models:
Can be added without specifying a project ID.
Accessible to all users on this deployment.
In enterprise deployments, public models are added monthly.
Specific requests for models can be made talking to our support.
Private Models:
Require a project_id
and a provider_id
for a known provider with a pre-configured secret.
Access is restricted to the specified project and provider.
When publishing a model, you can specify:
Request/Response Mapping:
By default, models are expected to be OpenAI-compatible.
You can also specify custom request/response processors using dynamic scripts (see 'Coming Soon' below).
Model Parameters Schema:
A JSON schema describing the parameters that can be sent with requests to the model.
The management API allows you to register new models with the platform. Below is an example of how to use the API to publish a new model.
curl -X POST https://api.xxx.langdb.ai/admin/models \
-H "Authorization: Bearer <your_token>" \
-H "X-Admin-Key: <admin_key>"\
-H "Content-Type: application/json" \
-d '{
"model_name": "my-model",
"description" "Description of the LLM Model",
"provider_id": "123e4567-e89b-12d3-a456-426614174000",
"project_id": "<project_id>",
"public": false,
"request_response_mapping": "openai-compatible", // or custom script
"model_type": "completions",
"input_token_price": "0.00001",
"output_token_price": "0.00003",
"context_size": 128000,
"capabilities": ["tools"],
"input_types": ["text", "image"],
"output_types": ["text", "image"],
"tags": [],
"owner_name": "openai",
"priority": 0,
"model_name_in_provider": "my-model-v1.2",
"parameters": {
"top_k":{
"min":0,
"step":1,
"type":"int",
"default":0,
"required":false,
"description":"Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens."
},
"top_p":{
"max":1,
"min":0,
"step":0.05,
"type":"float",
"default":1,
"required":false,
"description":"An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both."
},
}
}'
model_name
String
The display name of the model
description
String
A detailed description of the model's capabilities and use cases
provider_info_id
UUID
The UUID of the provider that offers this model
project_id
UUID
Which project this model belongs to
public
Boolean
Whether the model is publicly discoverable or private
request_response_mapping
String
"openai-compatible"
or a custom mapping script
model_type
String
The type of model (e.g., "completions"
, "image"
, "embedding"
)
owner_name
String
The name of the model's owner or creator
priority
i32
Priority level for the model in listings (higher numbers indicate higher priority)
input_token_price
Nullable float
Price per input token
output_token_price
Nullable float
Price per output token
context_size
Nullable u32
Maximum context window size in tokens
capabilities
String[]
List of model capabilities (e.g., "tools"
)
input_types
String[]
Supported input formats (e.g., "text"
, "image"
, "audio"
)
output_types
String[]
Supported output formats (e.g., "text"
, "image"
, "audio"
)
tags
String[]
Classification tags for the model
type_prices
Map<String, float>
JSON string containing prices for different usage types (used for image generation model pricing)
mp_price
Nullable float
Price by megapixel (used for image generation model pricing)
model_name_in_provider
String
The model's identifier in the provider's system
parameters
Map<String, Map<String, any>>
Additional configuration parameters as JSON
Checkout the full API Specification:
Replace <platform-url>
, <your_token>
, and <project_id>
with your actual values.
Set public
to true
for public models (omit project_id
and provider_id
), or false
for private models.
The parameters_schema
field allows you to define the expected parameters for your model.
The platform will soon support dynamic request/response mapping using custom scripts. This feature will allow you to define how requests are transformed before being sent to the model, and how responses are processed before being returned to the client. This will enable support for a wide variety of model APIs and custom workflows.
Stay tuned!
Use JSON rules or an embedded script to manage AI models, cut costs, and boost performance. A guide for enterprise-level routing.
LangDB Routing enables organisations to control how user requests are handled by AI models, optimizing for cost, performance, compliance, and user experience. By defining routing rules in JSON or through an embedded script, businesses can automate decision-making, ensure reliability, and maximize value from their AI investments.
Routing in LangDB is the process of directing each user request to the most appropriate AI model or service, based on business logic, user profile, request content, or real-time system metrics. This enables:
Cost savings by using cheaper models for non-critical tasks
Performance optimization by routing to the fastest or most reliable models
Compliance by enforcing data residency or content policies
Personalization by serving different user segments with tailored models
SLA-Driven Tiering
Guarantee premium performance for high-value customers.
user_tier
, ttft_ms
Route user_tier: "premium"
to models with the lowest ttft_ms
.
Geographic Compliance
Ensure data sovereignty and meet regulatory requirements (e.g., GDPR).
region
, compliance_tags
If region: "EU"
, route to models with compliance_tags: "GDPR"
.
Intelligent Cost Management
Reduce operational expenses for internal or low-priority tasks.
user_group
, price
If user_group: "internal"
, sort available models by price: MIN
.
Model A/B Testing
Evaluate new AI models on a subset of live traffic before full rollout.
user_id
(hashed), percentage split
Route 10% of traffic to a new model and 90% to the current default.
Content-Aware Routing
Improve accuracy by using specialized models for specific topics.
semantic_guardrail.result.topic
If topic: "finance"
, route to a finance-tuned model.
Brand Safety Enforcement
Prevent brand damage by blocking or redirecting inappropriate content.
toxicity_guardrail.result
If toxicity_score > 0.8
, block the request or route to a safe-reply model.
Checkout the full examples below:
routes
List of routing rules
[ ... ]
Controls all routing logic
name
Name of the rule
"vip_fast_lane"
For audit, clarity, and reporting
conditions
When to apply this rule
{ "user_tier": { "eq": "VIP" } }
Target specific users, topics, etc.
targets
Which model(s) to use if rule matches
{ "$any": [ ... ] }
Pool of models for flexibility
$any
Pool of models to choose from
["openai/gpt-4.5", ... ]
Enables failover and optimization
sort
How to pick the best model
{ "price": "MIN" }
Optimize for cost, speed, or reliability
pre_request
Checks/enrichments before routing
[ ... ]
Add business logic, compliance, etc.
post_response
Actions after model response
[ ... ]
Moderate, fallback, or redact responses
message_mapper
Modify request/response
{ ... }
Customizes user experience
For advanced users, LangDB supports script-based routing using inline WASM (WebAssembly) scripts. These scripts can access all the same variables and metrics as JSON rules, enabling highly flexible, programmable routing logic for complex enterprise needs.
All router metrics (including routing decisions, latencies, error rates, and more) are available via OpenTelemetry and can be exported to your observability or monitoring stack for real-time analytics and alerting.
Start simple: Begin with clear, high-value rules (e.g., VIP fast lane, cost control)
Use metrics: Leverage available metrics to optimize for business goals
Test and iterate: Monitor outcomes and refine rules for better results
Document rules: Use the name
field and comments for clarity
Plan for compliance: Use region and content checks for regulatory needs
Monitor costs: Use token-based optimization to control spend