Gateway API
OpenAI-compatible API for direct model access
Drop-in replacement for OpenAI SDK. Works with any OpenAI-compatible client.
Gateway vs Chat API
Use the Gateway API (/api/v1/chat/completions) for direct, single-model access with advanced features like fallback routing, caching, load balancing, and guardrails. It returns standard OpenAI-format responses.
Use the Chat API (/api/chat) for multi-model battles, debates with voting, and autonomous web search. It returns custom SSE events.
Create Chat Completion
POST /api/v1/chat/completions
Request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID or alias (see Models for full list) |
messages | array | Yes | Array of message objects (supports multimodal content with images) |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | 0-2, controls randomness (default: 0.7) |
max_tokens | number | No | Max tokens to generate |
fallback | string[] | No | Fallback models if primary fails (e.g. ["gpt", "gemini"]) |
retries | number | No | Max retries on transient errors with exponential backoff (default: 2, max: 5) |
timeout | number | No | Request timeout in ms (default: 60000, max: 300000) |
load_balance | object | No | Load balancing config (see below) |
guardrails | object | No | Input/output guardrails (see below) |
prompt_id | string | No | Langfuse prompt template name |
prompt_version | number | No | Prompt template version (default: production) |
prompt_variables | object | No | Variables to substitute in prompt template |
Request Headers
| Header | Description |
|---|---|
X-Cache: no-cache | Skip response cache |
X-Cache-TTL: 3600 | Cache TTL in seconds (default: 3600, max: 86400) |
X-Guardrails: pii,content_moderation | Alternative to body guardrails |
Response
Response Headers
| Header | Description |
|---|---|
X-Request-ID | Unique request identifier |
X-Model-Used | Actual model that served the request |
X-Cache: HIT/MISS | Cache status (non-streaming only) |
X-Fallback-From | Original model if fallback was used |
X-Retry-Count | Number of retries attempted |
X-Guardrail-Status | Guardrail results (e.g. pii:redact,content_moderation:pass) |
Gateway Features
Fallback Routing
Automatically try backup models if the primary fails:
Response Caching
Responses are cached automatically for non-streaming requests (1 hour default). Control via headers:
Load Balancing
Distribute requests across models:
Strategies: weighted, round-robin, least-latency.
Guardrails
Pre-process input and post-process output:
Available guardrails:
pii— Detects and redacts emails, phones, SSN, credit cards, IPscontent_moderation— Blocks dangerous contentschema_validation— Validates output against JSON schema
Prompt Templates (Langfuse)
Use managed prompt templates:
BYOK (Bring Your Own Keys)
Store your own provider API keys via the dashboard. When making requests, your key is automatically used instead of the platform key. See Authentication > BYOK for setup.
Vision Support
The Gateway supports images using OpenAI's multimodal message format. Vision-enabled models (GPT, Claude, Gemini, Grok) see the image directly. Non-vision models receive an AI-generated description transparently. See Models > Vision Support for the full matrix.
Streaming
System Messages
Each model handles system messages in its native format (e.g., Claude uses the system parameter, OpenAI uses instructions). This is transparent — just use the standard "role": "system" format.
Limitations
The Gateway API is a transparent passthrough — it does not include autonomous web search. For web-search-augmented responses with citations, use the Chat API with battle or fight mode.