Deploy routers to dynamically direct queries to different LLMs) based on predefined criteria. This feature enables optimization for cost, latency, or performance or based on percentage.
Benefits
-
Cost Optimization: Save up to 70% in query processing costs by dynamically routing to cost-effective models.
-
Latency Reduction: Minimize response times by routing queries to the fastest available models.
-
Use the best model: Support for multiple routing strategies to meet diverse business needs.
Routing Strategies
You can deploy routing based on several strategies as follow.
Description | |
---|---|
Cost | Routes queries between a |
Latency | Routes queries to the model with the lowest response time to minimize delays. |
Random | Distributes queries randomly across available models, useful for balancing load or testing. |
Percentage | Splits traffic between two models based on predefined percentages for A/B testing or load balancing. Eg: |
Benchmarks
To assess the cost-effectiveness of different routing strategies, we conducted benchmarking using the openai/grade-school-math dataset. This experiment illustrates how varying willingness_to_pay
values influence the costs associated with routing strategies using gpt-4o
and mixtral-8x7b-instruct-v0.1
for automated routing. We also use LLM-as-a-judge
to evaluate the reduction in quality comparing it to cost.
Sample Experiment
Willingness to Pay | Cost Reduction |
---|---|
0.33 | 73% |
0.30 | 64% |
0.20 | 6.7% |
Code Snippet
Here's how you can leverage routing in LangDB:
Example: cURL
# Import OpenAI library
import openai
from openai import OpenAI
# Set up the LangDB API base URL and bearer token
api_base = "https://api.us-east-1.langdb.ai" # LangDB API base URL
api_key = "xxxxx" # Replace with your LangDB token
default_headers = {"x-project-id": "xxxxx"} # LangDB Project ID
# Replace with your Project
client = OpenAI(
base_url=api_base,
api_key=api_key,
)
# Define the conversation messages
messages = [
{
"role": "user",
"content": "What is the capital of France??",
},
]
# Make the API call to LangDB's Completions API
response = client.chat.completions.create(
model="router/dynamic", # Use the model
messages=messages, # Define the interaction
extra_headers=default_headers,
extra_body={
"extra": {
"name": "Test",
"strategy": {"type": "cost", "willingness_to_pay": 0.3},
"models": ["gpt-4o", "mixtral-8x7b-instruct-v0.1"],
}
},
)
# Extract the assistant's reply
assistant_reply = response.choices[0].message