Drop-in replacement for OpenAI SDK. Works with any OpenAI-compatible client.

Gateway vs Chat API

Use the Gateway API (/api/v1/chat/completions) for direct, single-model access with advanced features like fallback routing, caching, load balancing, and guardrails. It returns standard OpenAI-format responses.

Use the Chat API (/api/chat) for multi-model battles, debates with voting, and autonomous web search. It returns custom SSE events.

Create Chat Completion

POST /api/v1/chat/completions

Request

{
  "model": "gpt",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1000
}

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID or alias (see Models for full list)
`messages`	array	Yes	Array of message objects (supports multimodal content with images)
`stream`	boolean	No	Enable SSE streaming (default: false)
`temperature`	number	No	0-2, controls randomness (default: 0.7)
`max_tokens`	number	No	Max tokens to generate
`fallback`	string[]	No	Fallback models if primary fails (e.g. `["gpt", "gemini"]`)
`retries`	number	No	Max retries on transient errors with exponential backoff (default: 2, max: 5)
`timeout`	number	No	Request timeout in ms (default: 60000, max: 300000)
`load_balance`	object	No	Load balancing config (see below)
`guardrails`	object	No	Input/output guardrails (see below)
`prompt_id`	string	No	Langfuse prompt template name
`prompt_version`	number	No	Prompt template version (default: production)
`prompt_variables`	object	No	Variables to substitute in prompt template

Request Headers

Header	Description
`X-Cache: no-cache`	Skip response cache
`X-Cache-TTL: 3600`	Cache TTL in seconds (default: 3600, max: 86400)
`X-Guardrails: pii,content_moderation`	Alternative to body guardrails

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706540000,
  "model": "gpt-5.2",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Response Headers

Header	Description
`X-Request-ID`	Unique request identifier
`X-Model-Used`	Actual model that served the request
`X-Cache: HIT/MISS`	Cache status (non-streaming only)
`X-Fallback-From`	Original model if fallback was used
`X-Retry-Count`	Number of retries attempted
`X-Guardrail-Status`	Guardrail results (e.g. `pii:redact,content_moderation:pass`)

Gateway Features

Fallback Routing

Automatically try backup models if the primary fails:

{
  "model": "claude",
  "messages": [{"role": "user", "content": "Hello"}],
  "fallback": ["gpt", "gemini"]
}

Response Caching

Responses are cached automatically for non-streaming requests (1 hour default). Control via headers:

# Skip cache
curl -H "X-Cache: no-cache" ...
 
# Custom TTL (24 hours)
curl -H "X-Cache-TTL: 86400" ...

Load Balancing

Distribute requests across models:

{
  "model": "gpt",
  "messages": [{"role": "user", "content": "Hello"}],
  "load_balance": {
    "strategy": "weighted",
    "targets": [
      {"model": "gpt", "weight": 70},
      {"model": "claude", "weight": 30}
    ]
  }
}

Strategies: weighted, round-robin, least-latency.

Guardrails

Pre-process input and post-process output:

{
  "model": "gpt",
  "messages": [{"role": "user", "content": "My email is john@example.com"}],
  "guardrails": {
    "enabled": ["pii", "content_moderation"]
  }
}

Available guardrails:

pii — Detects and redacts emails, phones, SSN, credit cards, IPs
content_moderation — Blocks dangerous content
schema_validation — Validates output against JSON schema

Prompt Templates (Langfuse)

Use managed prompt templates:

{
  "model": "claude",
  "messages": [{"role": "user", "content": "Review this code"}],
  "prompt_id": "code-review",
  "prompt_variables": {"language": "python"}
}

BYOK (Bring Your Own Keys)

Store your own provider API keys via the dashboard. When making requests, your key is automatically used instead of the platform key. See Authentication > BYOK for setup.

Vision Support

The Gateway supports images using OpenAI's multimodal message format. Vision-enabled models (GPT, Claude, Gemini, Grok) see the image directly. Non-vision models receive an AI-generated description transparently. See Models > Vision Support for the full matrix.

{
  "model": "claude",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What is in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
  }]
}

Streaming

const response = await client.chat.completions.create({
  model: 'gemini',
  messages: [{ role: 'user', content: 'Write a poem' }],
  stream: true,
});
 
for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

System Messages

Each model handles system messages in its native format (e.g., Claude uses the system parameter, OpenAI uses instructions). This is transparent — just use the standard "role": "system" format.

Limitations

The Gateway API is a transparent passthrough — it does not include autonomous web search. For web-search-augmented responses with citations, use the Chat API with battle or fight mode.

Gateway API

On this page