Load Balancer Routing
Distribute requests across multiple models to ensure high availability, balance load, and optimize performance. Use real-time metrics to select the best available model.
Use Case
High availability requirements
Load distribution across models
Performance optimization
Failover scenarios
Configuration
{
"model": "router/dynamic",
"router": {
"type": "conditional",
"routes": [
{
"name": "Balanced",
"targets": {
"$any": [
"openai/gpt-4.1-nano",
"gemini/gemini-2.0-flash",
"bedrock/llama3-2-3b-instruct-v1.0"
],
"sort_by": "requests",
"sort_order": "min"
}
}
]
}
}
How It Works
Model Pool: Defines three models for load distribution (GPT-4.1-nano, Gemini-2.0-flash, Llama3-2-3b)
Load Balancing: Automatically selects the model with the least current load (requests)
Automatic Distribution: Requests are distributed across the available models based on their current usage
Variables Used
requests
: Current load metric (used for sorting)
Customization
Adjust health thresholds
Add more models to the pool
Use different sorting strategies (ttft, price, etc.)
Implement weighted load balancing
Add geographic considerations
Last updated
Was this helpful?