Chat completions

The main endpoint. Identical shape to the OpenAI Chat Completions API, so existing code and SDKs work unchanged.

Endpoint

POST https://api.relay.com/v1/chat/completions

Request

{
  "model": "deepseek-v4-flash",
  "messages": [
    {"role": "system", "content": "You are concise."},
    {"role": "user", "content": "Explain prompt caching."}
  ],
  "temperature": 0.7,
  "stream": false
}

Parameters

Field	Notes
`model`	Any id from the marketplace
`messages`	Same shape as OpenAI (system / user / assistant)
`stream`	SSE streaming when `true`
`temperature`, `max_tokens`, `top_p`	Standard generation controls
`cache`	Prompt caching; on by default, set `false` to disable
`route`	`cheapest` (default) · `fastest` · a pinned provider

Response

{
  "id": "chatcmpl_...",
  "model": "deepseek-v4-flash",
  "choices": [{"message": {"role": "assistant", "content": "..."}}],
  "usage": {"prompt_tokens": 24, "completion_tokens": 58},
  "x_relay": {"provider": "DeepInfra", "cached": true, "saved_usd": 0.0007}
}

The x_relay block tells you which provider served the call, whether the cache was hit, and how much you saved — full transparency.

Streaming

Set "stream": true to receive server-sent events, identical to the OpenAI streaming format.

← Models Next: Caching, batch & routing →