ConcurredConcurred API

Rate Limits

API rate limits and quotas

Rate limits protect the API from abuse and ensure fair usage.

Rate Limit Tiers

EndpointRate Limit
/api/health120 requests/minute
/api/v1/chat/completions60 requests/minute
/api/v1/media/*60 requests/minute
/api/v1/vision60 requests/minute
/api/v1/fashion/run30 requests/minute
/api/v1/fashion/subscribe30 requests/minute
/api/v1/fashion/status/:id40 requests/10 seconds
/api/chat10 requests/minute

Rate Limit Headers

Every API response includes rate limit information:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 55
X-RateLimit-Reset: 1706540060
HeaderDescription
X-RateLimit-LimitMaximum requests per window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets

Handling Rate Limits

When you exceed the rate limit, you'll receive a 429 response:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Try again later.",
    "details": {
      "retryAfter": 60
    }
  }
}

Exponential Backoff Example

import time
import requests
 
def make_request_with_retry(url, headers, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)
 
        if response.status_code == 429:
            retry_after = response.json().get('error', {}).get('details', {}).get('retryAfter', 60)
            wait_time = min(retry_after * (2 ** attempt), 300)
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
            continue
 
        return response
 
    raise Exception("Max retries exceeded")

Best Practices

  1. Implement exponential backoff — Wait progressively longer between retries
  2. Cache responses — Use the Gateway API's built-in caching to avoid redundant calls
  3. Use streaming — For long responses, streaming is more efficient
  4. Use Unkey keys — API keys managed via Unkey have their own server-side rate limits

Need Higher Limits?

Contact us to discuss enterprise rate limits for high-volume applications.

On this page