Rate Limits
API rate limits and quotas
Rate limits protect the API from abuse and ensure fair usage.
Rate Limit Tiers
| Endpoint | Rate Limit |
|---|---|
/api/health | 120 requests/minute |
/api/v1/chat/completions | 60 requests/minute |
/api/v1/media/* | 60 requests/minute |
/api/v1/vision | 60 requests/minute |
/api/v1/fashion/run | 30 requests/minute |
/api/v1/fashion/subscribe | 30 requests/minute |
/api/v1/fashion/status/:id | 40 requests/10 seconds |
/api/chat | 10 requests/minute |
Rate Limit Headers
Every API response includes rate limit information:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per window |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Handling Rate Limits
When you exceed the rate limit, you'll receive a 429 response:
Exponential Backoff Example
Best Practices
- Implement exponential backoff — Wait progressively longer between retries
- Cache responses — Use the Gateway API's built-in caching to avoid redundant calls
- Use streaming — For long responses, streaming is more efficient
- Use Unkey keys — API keys managed via Unkey have their own server-side rate limits
Need Higher Limits?
Contact us to discuss enterprise rate limits for high-volume applications.