Routing

Deploy routers to dynamically direct queries to different LLMs) based on predefined criteria. This feature enables optimization for cost, latency, or performance or based on percentage.

Benefits

  • Cost Optimization: Save up to 70% in query processing costs by dynamically routing to cost-effective models.

  • Latency Reduction: Minimize response times by routing queries to the fastest available models.

  • Use the best model: Support for multiple routing strategies to meet diverse business needs.

Routing Strategies

You can deploy routing based on several strategies as follow.

Description

Cost

Routes queries between astrong and weak model based on willingness_to_pay. models: ['gpt-4o', 'mixtral-8x7b-instruct-v0.1']

Latency

Routes queries to the model with the lowest response time to minimize delays.

Random

Distributes queries randomly across available models, useful for balancing load or testing.

Percentage

Splits traffic between two models based on predefined percentages for A/B testing or load balancing. Eg: model_a: First model and its traffic percentage (e.g., "gpt-4o", 60%). model_b: Second model and its traffic percentage (e.g., "claude-3-5", 40%).

Benchmarks

To assess the cost-effectiveness of different routing strategies, we conducted benchmarking using the openai/grade-school-math dataset. This experiment illustrates how varying willingness_to_pay values influence the costs associated with routing strategies using gpt-4o and mixtral-8x7b-instruct-v0.1 for automated routing. We also use LLM-as-a-judge to evaluate the reduction in quality comparing it to cost.

Sample Experiment

Willingness to Pay

Cost Reduction

0.33

73%

0.30

64%

0.20

6.7%

Code Snippet

Here's how you can leverage routing in LangDB:

Example: cURL

# Import OpenAI library
import openai
from openai import OpenAI

# Set up the LangDB API base URL and bearer token
api_base = "https://api.us-east-1.langdb.ai"  # LangDB API base URL
api_key = "xxxxx"  # Replace with your LangDB token
default_headers = {"x-project-id": "xxxxx"}  # LangDB Project ID
# Replace with your Project
client = OpenAI(
    base_url=api_base,
    api_key=api_key,
)
# Define the conversation messages
messages = [
    {
        "role": "user",
        "content": "What is the capital of France??",
    },
]
# Make the API call to LangDB's Completions API
response = client.chat.completions.create(
    model="router/dynamic",  # Use the model
    messages=messages,  # Define the interaction
    extra_headers=default_headers,
    extra_body={
        "extra": {
            "name": "Test",
            "strategy": {"type": "cost", "willingness_to_pay": 0.3},
            "models": ["gpt-4o", "mixtral-8x7b-instruct-v0.1"],
        }
    },
)
# Extract the assistant's reply

assistant_reply = response.choices[0].message
Updated on