Rate Limiting
Apply rate limiting, cost control and more.
Rate limiting is an essential mechanism to prevent API abuse by controlling the number of requests allowed within a specific time frame. You can configure rate limits by setting hourly, daily and monthly total limits
This ensures fair usage and helps maintain system performance and stability.
# Limit to 1000 requests per hour
ai-gateway serve \
--rate-hourly 1000
--rate-daily 1000
--rate-monthly 1000
Or in config.yaml
:
rate_limit:
hourly: 100
daily: 1000
monthly: 10000
When a rate limit is exceeded, the API will return a 429 (Too Many Requests)
response.
Why Rate Limiting Matters
Prevents excessive LLM API usage: Controls the number of requests per user to avoid resource exhaustion.
Optimizes model inference efficiency: Ensures that LLM requests are processed smoothly without congestion.
Was this helpful?