Configure Fallback Routing with LangDB

Set up fallback routing with LangDB to keep AI applications online during traffic spikes or model outages by automatically switching models.

Ensure your AI applications stay online even during traffic spikes or model outages by configuring Fallback Routing. This guide walks you through setting up fallback routers using LangDB's routing feature.

What is Fallback Routing?

Fallback Routing allows LangDB to automatically switch to a backup model when your preferred model is slow, down, or overloaded. This helps you:

Avoid downtime
Improve reliability
Scale applications without manual intervention

Example: Basic Fallback Routing

Let’s say you want to use DeepSeek-Reasoner, but switch to GPT-4o if it becomes unavailable.

Here’s how you can use the UI to set it up:

Here’s how you can set it up programmatically:

{
    "model": "router/dynamic",
    "router": {
        "name": "fallback-router",
        "type": "fallback",
        "targets": [
            { "model": "deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
            { "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
        ]
    }
}

Behavior

First, it tries deepseek-reasoner
If that fails, it automatically falls back to GPT-4o

Example: Fallback with Load-Balancing

In the previous example, we implemented a simple fallback mechanism. However, a more robust solution would be to distribute queries across multiple providers of DeepSeek-R1 while maintaining a fallback to GPT-4o if both providers fail. This method helps balance traffic efficiently while ensuring uninterrupted AI services.

Here’s how you can configure Fallback Routing with Percentage-Based Load Balancing:

{
    "model": "router/dynamic",
    "router": {
        "name": "fallback-percentage-router",
        "type": "fallback",
        "targets": [
            {
                "model": "router/dynamic",
                "router": {
                    "name": "percentage-balanced",
                    "type": "percentage",
                    "model_a": [
                        { "model": "fireworksai/deepseek-r1", "temperature": 0.7, "max_tokens": 400 },
                        0.5
                    ],
                    "model_b": [
                        { "model": "deepseek/deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
                        0.5
                    ]
                }
            },
            { "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
        ]
    }
}

How This Works:

Primary Route: The system distributes requests evenly (50-50%) between two providers of DeepSeek-R1 to balance the load.
Fallback Route: If both DeepSeek-R1 providers are unavailable or fail, all requests are automatically rerouted to GPT-4o, ensuring continuous service.

This approach provides load balancing, and reliable fallback protection, making it ideal for AI applications facing high demand and occasional model unavailability.

In more complex scenarios, you can configure a multi-level fallback system with percentage-based distribution. This approach allows requests to be routed dynamically based on pricing, performance, or reliability, ensuring efficiency while preventing downtime.

Checkout Routing Strategies for more routing strategies.

By leveraging dynamic routing, you can:

Prevent downtime by automatically switching to backup models.
Optimize performance and cost with smart load balancing.
Ensure scalability without manual intervention.

With LangDB’s flexible and powerful routing capabilities, you can build AI applications that are not only intelligent but also robust and fail-safe.

PreviousConnecting LLMs to the Web with Real-Time Search Tools NextTracing Multiple Agents

Last updated 1 month ago

Was this helpful?