核心 · Key Idea
In one line: OpenAI's /v1/chat/completions has become the de-facto LLM API standard. DeepSeek / Qwen / GLM / Moonshot / SiliconFlow / Together / OpenRouter / vLLM / Ollama / LM Studio — almost everyone is compatible. One codebase swaps providers by changing base_url and api_key.
What it is#
from openai import OpenAI
# Any vendor
client = OpenAI(
base_url="https://api.deepseek.com/v1", # change URL = change vendor
api_key="sk-xxx",
)
resp = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "hello"}],
stream=True,
)
for chunk in resp:
print(chunk.choices[0].delta.content or "", end="")Analogy#
打个比方 · Analogy
Like the USB port: phones, mice, and cameras have different shapes inside, but all expose USB — swapping devices just means plugging in a cable. OpenAI's API is the USB of the LLM industry.
Major compatible providers#
- OpenAI itself
- https://api.openai.com/v1 — gpt-4o, o-series, gpt-5.
- Anthropic
- https://api.anthropic.com — not byte-identical (its own protocol), but the official SDK feels similar.
- DeepSeek
- https://api.deepseek.com — deepseek-chat / deepseek-reasoner. Cheap and strong.
- Qwen / Alibaba
- https://dashscope.aliyuncs.com/compatible-mode/v1 — qwen-max / qwen-plus / qwen3.
- Zhipu GLM
- https://open.bigmodel.cn/api/paas/v4 — glm-4.6, glm-4.5-air.
- Moonshot Kimi
- https://api.moonshot.cn/v1 — moonshot-v1-32k etc.
- OpenRouter
- https://openrouter.ai/api/v1 — single endpoint to dozens of vendors' models.
- SiliconFlow / Together / Fireworks / Groq
- Aggregate open-source models, billed per token.
- Local
- vLLM / Ollama / LM Studio / Llama.cpp server / Mistral.rs — all OpenAI-compatible.
How it works#
Subtle differences#
Although compatible, each vendor has subtle quirks:
tools(function calling): OpenAI standard / Anthropic differ in schema; DeepSeek and Qwen are largely OpenAI-compatible.- Structured output:
response_format: { type: "json_schema" }support varies. - Streaming chunks: reasoning models like DeepSeek-R1 emit thoughts in
delta.reasoning_content. - Multimodal: OpenAI uses
image_url; Qwen also acceptsimage_url; some vendors use proprietary schemas. - Token / context limits: max input/output lengths vary per vendor.
Practical notes#
- Externalise
LLM_PROVIDER+BASE_URLin config — swapping vendors = env change only. - Don't hard-code model names. In production use aliases (your layer calls it
chat-fast, mapping to the actual model). - Rate limits / retries / backoff: vary per vendor; centralise in your client. OpenRouter offers fallback routing.
- Cost compare: DeepSeek-V3 / GLM / Qwen mainstream models are 5–20× cheaper than OpenAI at the same tier. Develop on OpenAI, ship on cheaper / open-source.
- Compliance: data residency, training-data inclusion — read the terms.
- Streaming disconnect: SSE breaks on flaky networks; client must retry + handle idempotency.
Easy confusions#
OpenAI-compatible API
`chat.completions` and basic endpoints.
All vendors target this surface.
All vendors target this surface.
OpenAI-only features
Realtime API, Assistants, Files, Fine-tune, batch.
Most third parties don't implement / partially implement these.
Most third parties don't implement / partially implement these.