ONE API · CHEAPEST OPEN & AFFORDABLE MODELS

The lowest effective cost
for open & affordable AI models

DeepSeek, Qwen, Kimi, GLM, Llama, Mistral and more — through a single OpenAI-compatible API. Built-in caching, batch and smart routing cut your real bill far below raw rates. Transparent pricing. No silent model downgrades.

Get your API key — free credits Try the playground →

0tokens served today

0curated models

0upstream providers

0routing uptime

0avg effective savings*

*illustrative demo figures — replace with real metrics before launch.

The Savings Engine

Raw token price is only half the story. We lower the bill you actually pay — automatically.

🗂️

Prompt Caching

Repeated prefixes cost up to 90% less on cache reads. On by default.

📦

Batch Lane

Async jobs run at ~50% off. Perfect for evals, labeling, nightly pipelines.

🧭

Cheapest-First Routing

Every request goes to the cheapest healthy provider — and we show you which.

🪜

Smart Cascading

Easy tasks fall to smaller models; only hard ones hit the big ones.

Stack caching + batch and effective cost drops to roughly 25% of on-demand rates on eligible traffic.

Featured models

A taste of the catalog. Click a card for providers, latency and sample code.

Browse all models →

Why Relay

💸 Lowest effective cost

Caching, batch and routing cut your real bill — not just the sticker price.

🔍 Radically transparent

See the cheapest provider, the price, and exactly what each call cost. No hidden markup, no silent downgrades.

🤝 Built for builders

OpenAI-compatible, one-line switch, free credits to start. Plus custom agents for business.

Start cutting your AI bill today

Get an API key, claim free credits, and switch from OpenAI in one line.

The lowest effective costfor open & affordable AI models