OpenRouter (model aggregation)

核心 · Key Idea

In one line: OpenRouter is the wholesale market for LLMs — wires up OpenAI / Anthropic / Google / Meta / DeepSeek / Mistral / many open-weights hosts behind one OpenAI-compatible endpoint. Cost savings + redundancy + price comparison in one stop.

What it is#

from openai import OpenAI
 
c = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)
 
# Specify any model
c.chat.completions.create(model="anthropic/claude-3.5-sonnet", messages=[...])
c.chat.completions.create(model="deepseek/deepseek-chat",     messages=[...])
c.chat.completions.create(model="meta-llama/llama-3.1-70b-instruct", messages=[...])

Model name format: vendor/model-id.

Analogy#

打个比方 · Analogy

Going direct to each vendor = opening a membership card per shop — N stores, N cards, N quotas.
OpenRouter = a single all-shops card — one card swipes everywhere, one bill.

Key capabilities#

Unified billingUnified billing

All models drawn from one OpenRouter balance. **One invoice.**

Routing / FallbackRouting / fallback

Pass a `models` array; if the primary fails OpenRouter falls back automatically.

Pricing transparencyPer-model pricing

/models endpoint returns live input/output prices per model.

Provider preferenceProvider preference

Same open-weights model is often hosted by Together / Fireworks / Lepton — you can specify preferences.

Stream / Tools / VisionFeature passthrough

Most upstream features pass through unchanged.

App identificationHTTP-Referer / X-Title

OpenRouter uses these headers to identify the calling application.

How it works#

OpenRouter handles auth / billing / routing / quota in the middle.

Practical notes#

Multi-model fallback:

{
  "models": [
    "anthropic/claude-3.5-sonnet",
    "openai/gpt-4o",
    "deepseek/deepseek-chat"
  ]
}

Listed in fallback order. 5xx / rate-limit on the first auto-switches to the next.

Tracking metrics: OpenRouter dashboard shows success rate / latency / cost per model.
Data policy: each provider's "trains on requests?" status is on the model card — for sensitive data, set --data-policy strict.
Streaming behaviour: vendor differences (reasoning content / tool deltas) are normalised but not 100% — read metadata to see the actual backend.
Rate limits: OpenRouter has its own caps + provider caps. Spread bursts in time.
From China: direct connection can be slow; a self-hosted proxy + Cloudflare Tunnel is a common workaround.

Easy confusions#

OpenRouter (aggregator)

SaaS with unified billing.
Convenient, but every call traverses them.

LiteLLM (local proxy)

OpenAI-compatible proxy you run yourself.
Data avoids a third party; you provide each vendor's key.