ArcLibrary

OpenRouter (model aggregation)

One key, many vendors — unified billing, automatic routing, automatic fallback.

OpenRouterRouterAPI
核心 · Key Idea

In one line: OpenRouter is the wholesale market for LLMs — wires up OpenAI / Anthropic / Google / Meta / DeepSeek / Mistral / many open-weights hosts behind one OpenAI-compatible endpoint. Cost savings + redundancy + price comparison in one stop.

What it is#

from openai import OpenAI
 
c = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)
 
# Specify any model
c.chat.completions.create(model="anthropic/claude-3.5-sonnet", messages=[...])
c.chat.completions.create(model="deepseek/deepseek-chat",     messages=[...])
c.chat.completions.create(model="meta-llama/llama-3.1-70b-instruct", messages=[...])

Model name format: vendor/model-id.

Analogy#

打个比方 · Analogy

Going direct to each vendor = opening a membership card per shop — N stores, N cards, N quotas.
OpenRouter = a single all-shops card — one card swipes everywhere, one bill.

Key capabilities#

Unified billingUnified billing
All models drawn from one OpenRouter balance. **One invoice.**
Routing / FallbackRouting / fallback
Pass a `models` array; if the primary fails OpenRouter falls back automatically.
Pricing transparencyPer-model pricing
/models endpoint returns live input/output prices per model.
Provider preferenceProvider preference
Same open-weights model is often hosted by Together / Fireworks / Lepton — you can specify preferences.
Stream / Tools / VisionFeature passthrough
Most upstream features pass through unchanged.
App identificationHTTP-Referer / X-Title
OpenRouter uses these headers to identify the calling application.

How it works#

OpenRouter handles auth / billing / routing / quota in the middle.

Practical notes#

  • Multi-model fallback:

    {
      "models": [
        "anthropic/claude-3.5-sonnet",
        "openai/gpt-4o",
        "deepseek/deepseek-chat"
      ]
    }

    Listed in fallback order. 5xx / rate-limit on the first auto-switches to the next.

  • Tracking metrics: OpenRouter dashboard shows success rate / latency / cost per model.

  • Data policy: each provider's "trains on requests?" status is on the model card — for sensitive data, set --data-policy strict.

  • Streaming behaviour: vendor differences (reasoning content / tool deltas) are normalised but not 100% — read metadata to see the actual backend.

  • Rate limits: OpenRouter has its own caps + provider caps. Spread bursts in time.

  • From China: direct connection can be slow; a self-hosted proxy + Cloudflare Tunnel is a common workaround.

Easy confusions#

OpenRouter (aggregator)
SaaS with unified billing.
Convenient, but every call traverses them.
LiteLLM (local proxy)
OpenAI-compatible proxy you run yourself.
Data avoids a third party; you provide each vendor's key.

Further reading#