Chat completions
The main endpoint. Identical shape to the OpenAI Chat Completions API, so existing code and SDKs work unchanged.
Endpoint
POST https://api.relay.com/v1/chat/completions
Request
{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Explain prompt caching."}
],
"temperature": 0.7,
"stream": false
}
Parameters
| Field | Notes |
|---|---|
model | Any id from the marketplace |
messages | Same shape as OpenAI (system / user / assistant) |
stream | SSE streaming when true |
temperature, max_tokens, top_p | Standard generation controls |
cache | Prompt caching; on by default, set false to disable |
route | cheapest (default) · fastest · a pinned provider |
Response
{
"id": "chatcmpl_...",
"model": "deepseek-v4-flash",
"choices": [{"message": {"role": "assistant", "content": "..."}}],
"usage": {"prompt_tokens": 24, "completion_tokens": 58},
"x_relay": {"provider": "DeepInfra", "cached": true, "saved_usd": 0.0007}
}
The x_relay block tells you which provider served the call, whether the cache was hit, and how much you saved — full transparency.
Streaming
Set "stream": true to receive server-sent events, identical to the OpenAI streaming format.