In one line: OpenRouter is the wholesale market for LLMs — wires up OpenAI / Anthropic / Google / Meta / DeepSeek / Mistral / many open-weights hosts behind one OpenAI-compatible endpoint. Cost savings + redundancy + price comparison in one stop.
What it is#
from openai import OpenAI
c = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-...",
)
# Specify any model
c.chat.completions.create(model="anthropic/claude-3.5-sonnet", messages=[...])
c.chat.completions.create(model="deepseek/deepseek-chat", messages=[...])
c.chat.completions.create(model="meta-llama/llama-3.1-70b-instruct", messages=[...])Model name format: vendor/model-id.
Analogy#
Going direct to each vendor = opening a membership card per shop — N stores, N cards, N quotas.
OpenRouter = a single all-shops card — one card swipes everywhere, one bill.
Key capabilities#
How it works#
OpenRouter handles auth / billing / routing / quota in the middle.
Practical notes#
-
Multi-model fallback:
{ "models": [ "anthropic/claude-3.5-sonnet", "openai/gpt-4o", "deepseek/deepseek-chat" ] }Listed in fallback order. 5xx / rate-limit on the first auto-switches to the next.
-
Tracking metrics: OpenRouter dashboard shows success rate / latency / cost per model.
-
Data policy: each provider's "trains on requests?" status is on the model card — for sensitive data, set
--data-policy strict. -
Streaming behaviour: vendor differences (reasoning content / tool deltas) are normalised but not 100% — read
metadatato see the actual backend. -
Rate limits: OpenRouter has its own caps + provider caps. Spread bursts in time.
-
From China: direct connection can be slow; a self-hosted proxy + Cloudflare Tunnel is a common workaround.
Easy confusions#
Convenient, but every call traverses them.
Data avoids a third party; you provide each vendor's key.