Rate Limiting

Apply rate limiting, cost control and more.

Rate limiting is an essential mechanism to prevent API abuse by controlling the number of requests allowed within a specific time frame. You can configure rate limits by setting hourly, daily and monthly total limits

This ensures fair usage and helps maintain system performance and stability.

# Limit to 1000 requests per hour
ai-gateway serve \
    --rate-hourly 1000
    --rate-daily 1000
    --rate-monthly 1000

Or in config.yaml:

rate_limit:
  hourly: 100
  daily: 1000
  monthly: 10000

When a rate limit is exceeded, the API will return a 429 (Too Many Requests) response.

Last updated

Was this helpful?