Rate limits protect the API from abuse and ensure fair usage.

Rate Limit Tiers

Endpoint	Rate Limit
`/api/health`	120 requests/minute
`/api/v1/chat/completions`	60 requests/minute
`/api/v1/media/*`	60 requests/minute
`/api/v1/vision`	60 requests/minute
`/api/v1/fashion/run`	30 requests/minute
`/api/v1/fashion/subscribe`	30 requests/minute
`/api/v1/fashion/status/:id`	40 requests/10 seconds
`/api/chat`	10 requests/minute

Rate Limit Headers

Every API response includes rate limit information:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 55
X-RateLimit-Reset: 1706540060

Header	Description
`X-RateLimit-Limit`	Maximum requests per window
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, you'll receive a 429 response:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Try again later.",
    "details": {
      "retryAfter": 60
    }
  }
}

Exponential Backoff Example

import time
import requests
 
def make_request_with_retry(url, headers, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)
 
        if response.status_code == 429:
            retry_after = response.json().get('error', {}).get('details', {}).get('retryAfter', 60)
            wait_time = min(retry_after * (2 ** attempt), 300)
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
            continue
 
        return response
 
    raise Exception("Max retries exceeded")

Best Practices

Implement exponential backoff — Wait progressively longer between retries
Cache responses — Use the Gateway API's built-in caching to avoid redundant calls
Use streaming — For long responses, streaming is more efficient
Use Unkey keys — API keys managed via Unkey have their own server-side rate limits

Rate Limits

Rate Limit Tiers

Rate Limit Headers

Handling Rate Limits

Exponential Backoff Example

Need Higher Limits?

On this page