Chat completions

The main endpoint. Identical shape to the OpenAI Chat Completions API, so existing code and SDKs work unchanged.

Endpoint

POST https://api.relay.com/v1/chat/completions

Request

{
  "model": "deepseek-v4-flash",
  "messages": [
    {"role": "system", "content": "You are concise."},
    {"role": "user", "content": "Explain prompt caching."}
  ],
  "temperature": 0.7,
  "stream": false
}

Parameters

FieldNotes
modelAny id from the marketplace
messagesSame shape as OpenAI (system / user / assistant)
streamSSE streaming when true
temperature, max_tokens, top_pStandard generation controls
cachePrompt caching; on by default, set false to disable
routecheapest (default) · fastest · a pinned provider

Response

{
  "id": "chatcmpl_...",
  "model": "deepseek-v4-flash",
  "choices": [{"message": {"role": "assistant", "content": "..."}}],
  "usage": {"prompt_tokens": 24, "completion_tokens": 58},
  "x_relay": {"provider": "DeepInfra", "cached": true, "saved_usd": 0.0007}
}

The x_relay block tells you which provider served the call, whether the cache was hit, and how much you saved — full transparency.

Streaming

Set "stream": true to receive server-sent events, identical to the OpenAI streaming format.