Routing Engine (Enterprise Only)

Use JSON rules or an embedded script to manage AI models, cut costs, and boost performance. A guide for enterprise-level routing.

Routing Lifecycle

LangDB's Routing Engine enables organizations to control how user requests are handled by AI models, optimizing for cost, performance, compliance, and user experience. By defining routing rules in JSON, businesses can automate decision-making, ensure reliability, and maximize value from their AI investments.

What is Routing and What are its Components?

Routing is the process of directing an incoming request to the most appropriate AI model based on a set of rules. The LangDB router is composed of several key components that work together to execute this logic:

  • Routes: These are the core building blocks of your router. A router is essentially a list of routes that are evaluated in order. The first route whose conditions are met is executed.

  • Conditions: The logic that determines whether a route should be triggered. Conditions can evaluate request data, user metadata, or results from pre-request hooks.

  • Targets: The destination for a request if a route's conditions are met. This is typically one or more AI models.

  • Interceptors (Guardrails & Rate Limiters): These are pre-request hooks that can inspect, modify, or enrich a request before routing rules are evaluated. Their results can be used in conditions.

  • Message Mapper: A component used to block a request or modify the final response, often for handling errors like rate limits.

Example Use Cases

Enterprise Use Case
Business Goal
Key Variables & Metrics
Routing Logic Summary

SLA-Driven Tiering

Guarantee premium performance for high-value customers.

extra.user.tier, ttft

Route extra.user.tier: "premium" to models with the lowest ttft.

Geographic Compliance

Ensure data sovereignty and meet regulatory requirements (e.g., GDPR).

metadata.region, extra.user.tags

If metadata.region: "EU", route to models for users with tags: ["GDPR"].

Intelligent Cost Management

Reduce operational expenses for internal or low-priority tasks.

metadata.group_name, price

If metadata.group_name: "internal", sort available models by "sort_by": "price".

Content-Aware Routing

Improve accuracy by using specialized models for specific topics.

pre_request.semantic_guardrail.result.topic

If topic: "finance", route to a finance-tuned model.

Brand Safety Enforcement

Prevent brand damage by blocking or redirecting inappropriate content.

pre_request.toxicity_guardrail.result.passed

If passed: false, block the request or route to a safe-reply model.

For more detailed examples, see the pages below:

ExamplesExample: Building an Enterprise Routing ConfigurationExample: Routing with Interceptors and Compliance

Anatomy of a Routing Request

A routing request is a standard chat completion request with two key additions:

  1. The model must be set to "router/dynamic".

  2. A router object containing your routing logic must be included in the request body.

Here’s how the various components fit into a complete request. The example below shows a simple configuration with two routes: one for premium users and a fallback for everyone else.

{
  "model": "router/dynamic",
  "messages": [
    {
      "role": "user",
      "content": "Our production API is down, I need help now!"
    }
  ],
  "extra": {
    "user": { "tier": "premium" }
  },
  "router": {
    "type": "conditional",
    "routes": [
      {
        "name": "premium_support_fast_track",
        "conditions": {
          "all": [
            { "extra.user.tier": { "$eq": "premium" } }
          ]
        },
        "targets": {
          "$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
          "sort_by": "ttft",
          "sort_order": "min"
        }
      },
      {
        "name": "default_fallback",
        "conditions": { "all": [] },
        "targets": "openai/gpt-4o-mini"
      }
    ]
  }
}

Routing Components Explained

This section breaks down each of the major components of the router object.

Routes

The routes property contains an array of route objects. These are evaluated sequentially from top to bottom, and the first route whose conditions are met will be executed. Every route must have a name, conditions, and targets.

Conditions

The conditions block defines when a route should be activated. It uses a flexible JSON-based syntax.

  • Logical Operators: You can combine multiple conditions using all (AND) or any (OR).

  • Comparison Operators: Conditions use operators like $eq (equal), $neq (not equal), $in (in array), $lt (less than), $gt (greater than) to evaluate variables.

  • Lazy Evaluation of Guardrails: It's important to note that guardrails are evaluated lazily. A guardrail interceptor will only be executed if the router encounters a condition that requires its result (pre_request.{guardrail_name}.*). This prevents unnecessary latency.

Targets

The targets block defines what happens when a route is matched. It specifies one or more models to which the request can be sent.

Specifying Models

You can specify models in your targets list in several ways, giving you flexibility in how you define your candidate pool.

  • Exact Name with Provider: openai/gpt-4o

    • This is the most specific and recommended method. It uniquely identifies a single model from a single provider.

  • Provider Wildcard: openai/*

    • This selects all available models from a specific provider (e.g., all models from OpenAI). This is useful for creating provider-level routing rules or fallbacks.

  • Model Name Only: claude-3-opus

    • This selects all models with that name from any available provider. For example, if both Anthropic and another provider offered claude-3-opus, both would be added to the candidate pool. This is particularly useful when you want to use sorting to find the best provider for a specific model based on real-time metrics like price or ttft. Use with caution, as it can be ambiguous if providers have different capabilities for the same model name.

Filtering Models

Before selecting a model, you can filter the list of potential targets in $any using the filter property. This is useful for ensuring models meet certain real-time performance criteria.

"targets": {
    "$any": ["anthropic/claude-4-opus", "openai/gpt-o3"],
    "filter": {
      "error_rate": { "$lt": 0.02 }
    }
}

This example ensures that only models with an error rate below 2% are considered.

Sorting Models

After filtering, the router can sort the remaining candidate models to find the best one based on a specific metric.

  • sort_by: The metric to use for sorting. Common values are price, ttft (time to first token), and error_rate.

  • sort_order: The direction to sort, either min (for lowest cost/latency) or max.

"targets": {
    "$any": ["mistral/mistral-large-latest", "anthropic/claude-4-sonnet"],
    "sort_by": "price",
    "sort_order": "min"
}

This example selects the cheapest model from the pool.

Interceptors (Guardrails & Rate Limiters)

Interceptors are hooks that run before the main routing logic is evaluated. They are defined in the pre_request array.

  • Guardrails enforce content and safety policies (e.g., checking for toxicity or classifying topics).

  • Rate Limiters enforce usage quotas to prevent abuse.

The results of these interceptors are made available in the pre_request variable space for use in your conditions. For detailed configuration, see Interceptors & Guardrails.

Message Mapper

The Message Mapper is used to take direct control of the response. Its most common use case is to block a request that has failed an interceptor check and return a custom error message.

{
  "name": "rate_limit_exceeded_block",
  "conditions": {
    "pre_request.rate_limiter.passed": { "$eq": false }
  },
  "message_mapper": {
    "modifier": "block",
    "content": "You have exceeded your daily quota."
  }
}

Metadata and Variables

Effective routing relies on rich contextual information. LangDB provides two main sources of data for your conditions: extra.user.* data, which you pass in the request, and metadata.* data, which is automatically populated by LangDB. For a complete list, see Variables & Functions.

Components Summary

Component
Purpose
Key Configuration

Routes

A list of rules evaluated in order.

name, conditions, targets

Conditions

The "if" statement for a route.

all, any, operators ($eq, $lt), variables

Targets

The destination model(s) for a route.

$any, filter, sort_by, sort_order

Interceptors

Pre-request hooks for validation or enrichment.

pre_request array, type (guardrail, interceptor)

Message Mapper

Blocks or modifies the final response.

modifier: "block", content


Performance Impact

Different routing components have different performance characteristics.

  • Guardrails & Interceptors:

    • Simple checks like a Rate Limiter are very fast, typically involving a quick lookup in Redis.

    • More complex guardrails, especially those that are "LLM-as-a-judge" (i.e., they make an LLM call to evaluate the prompt), will introduce significant latency to the request, equal to the duration of that LLM call.

  • Model Sorting:

    • Sorting models based on performance metrics (ttft, error_rate) or price requires fetching this data from a metrics store (like Redis). This is a very fast operation but does add a small, fixed overhead to each routed request.


Tracing & Observability

Routing decisions are fully transparent and traceable within the LangDB ecosystem.

  • Routing Performance: The performance of the routing logic itself is tracked, so you can monitor for any overhead.

  • Candidate & Picked Models: For every request, the trace records:

    • The initial pool of candidate models for a matched route.

    • The list of models remaining after filtering.

    • The final picked model after sorting.

  • OpenTelemetry: All router metrics (decisions, latencies, error rates) are exported via OpenTelemetry, allowing you to integrate with your existing observability stack (e.g., DataDog, New Relic) for real-time analytics and alerting.


Last updated

Was this helpful?