ConcurredConcurred API

Gateway API

OpenAI-compatible API for direct model access

Drop-in replacement for OpenAI SDK. Works with any OpenAI-compatible client.

Gateway vs Chat API

Use the Gateway API (/api/v1/chat/completions) for direct, single-model access with advanced features like fallback routing, caching, load balancing, and guardrails. It returns standard OpenAI-format responses.

Use the Chat API (/api/chat) for multi-model battles, debates with voting, and autonomous web search. It returns custom SSE events.

Create Chat Completion

POST /api/v1/chat/completions

Request

{
  "model": "gpt",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1000
}

Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID or alias (see Models for full list)
messagesarrayYesArray of message objects (supports multimodal content with images)
streambooleanNoEnable SSE streaming (default: false)
temperaturenumberNo0-2, controls randomness (default: 0.7)
max_tokensnumberNoMax tokens to generate
fallbackstring[]NoFallback models if primary fails (e.g. ["gpt", "gemini"])
retriesnumberNoMax retries on transient errors with exponential backoff (default: 2, max: 5)
timeoutnumberNoRequest timeout in ms (default: 60000, max: 300000)
load_balanceobjectNoLoad balancing config (see below)
guardrailsobjectNoInput/output guardrails (see below)
prompt_idstringNoLangfuse prompt template name
prompt_versionnumberNoPrompt template version (default: production)
prompt_variablesobjectNoVariables to substitute in prompt template

Request Headers

HeaderDescription
X-Cache: no-cacheSkip response cache
X-Cache-TTL: 3600Cache TTL in seconds (default: 3600, max: 86400)
X-Guardrails: pii,content_moderationAlternative to body guardrails

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706540000,
  "model": "gpt-5.2",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Response Headers

HeaderDescription
X-Request-IDUnique request identifier
X-Model-UsedActual model that served the request
X-Cache: HIT/MISSCache status (non-streaming only)
X-Fallback-FromOriginal model if fallback was used
X-Retry-CountNumber of retries attempted
X-Guardrail-StatusGuardrail results (e.g. pii:redact,content_moderation:pass)

Gateway Features

Fallback Routing

Automatically try backup models if the primary fails:

{
  "model": "claude",
  "messages": [{"role": "user", "content": "Hello"}],
  "fallback": ["gpt", "gemini"]
}

Response Caching

Responses are cached automatically for non-streaming requests (1 hour default). Control via headers:

# Skip cache
curl -H "X-Cache: no-cache" ...
 
# Custom TTL (24 hours)
curl -H "X-Cache-TTL: 86400" ...

Load Balancing

Distribute requests across models:

{
  "model": "gpt",
  "messages": [{"role": "user", "content": "Hello"}],
  "load_balance": {
    "strategy": "weighted",
    "targets": [
      {"model": "gpt", "weight": 70},
      {"model": "claude", "weight": 30}
    ]
  }
}

Strategies: weighted, round-robin, least-latency.

Guardrails

Pre-process input and post-process output:

{
  "model": "gpt",
  "messages": [{"role": "user", "content": "My email is john@example.com"}],
  "guardrails": {
    "enabled": ["pii", "content_moderation"]
  }
}

Available guardrails:

  • pii — Detects and redacts emails, phones, SSN, credit cards, IPs
  • content_moderation — Blocks dangerous content
  • schema_validation — Validates output against JSON schema

Prompt Templates (Langfuse)

Use managed prompt templates:

{
  "model": "claude",
  "messages": [{"role": "user", "content": "Review this code"}],
  "prompt_id": "code-review",
  "prompt_variables": {"language": "python"}
}

BYOK (Bring Your Own Keys)

Store your own provider API keys via the dashboard. When making requests, your key is automatically used instead of the platform key. See Authentication > BYOK for setup.


Vision Support

The Gateway supports images using OpenAI's multimodal message format. Vision-enabled models (GPT, Claude, Gemini, Grok) see the image directly. Non-vision models receive an AI-generated description transparently. See Models > Vision Support for the full matrix.

{
  "model": "claude",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What is in this image?"},
      {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
  }]
}

Streaming

const response = await client.chat.completions.create({
  model: 'gemini',
  messages: [{ role: 'user', content: 'Write a poem' }],
  stream: true,
});
 
for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

System Messages

Each model handles system messages in its native format (e.g., Claude uses the system parameter, OpenAI uses instructions). This is transparent — just use the standard "role": "system" format.

Limitations

The Gateway API is a transparent passthrough — it does not include autonomous web search. For web-search-augmented responses with citations, use the Chat API with battle or fight mode.

On this page