OpenAI-Compatible API (the de-facto standard)

核心 · Key Idea

In one line: OpenAI's /v1/chat/completions has become the de-facto LLM API standard. DeepSeek / Qwen / GLM / Moonshot / SiliconFlow / Together / OpenRouter / vLLM / Ollama / LM Studio — almost everyone is compatible. One codebase swaps providers by changing base_url and api_key.

What it is#

from openai import OpenAI
 
# Any vendor
client = OpenAI(
    base_url="https://api.deepseek.com/v1",   # change URL = change vendor
    api_key="sk-xxx",
)
 
resp = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)
for chunk in resp:
    print(chunk.choices[0].delta.content or "", end="")

Analogy#

打个比方 · Analogy

Like the USB port: phones, mice, and cameras have different shapes inside, but all expose USB — swapping devices just means plugging in a cable. OpenAI's API is the USB of the LLM industry.

Major compatible providers#

OpenAI itself: https://api.openai.com/v1 — gpt-4o, o-series, gpt-5.
Anthropic: https://api.anthropic.com — not byte-identical (its own protocol), but the official SDK feels similar.
DeepSeek: https://api.deepseek.com — deepseek-chat / deepseek-reasoner. Cheap and strong.
Qwen / Alibaba: https://dashscope.aliyuncs.com/compatible-mode/v1 — qwen-max / qwen-plus / qwen3.
Zhipu GLM: https://open.bigmodel.cn/api/paas/v4 — glm-4.6, glm-4.5-air.
Moonshot Kimi: https://api.moonshot.cn/v1 — moonshot-v1-32k etc.
OpenRouter: https://openrouter.ai/api/v1 — single endpoint to dozens of vendors' models.
SiliconFlow / Together / Fireworks / Groq: Aggregate open-source models, billed per token.
Local: vLLM / Ollama / LM Studio / Llama.cpp server / Mistral.rs — all OpenAI-compatible.

How it works#

Subtle differences#

Although compatible, each vendor has subtle quirks:

tools (function calling): OpenAI standard / Anthropic differ in schema; DeepSeek and Qwen are largely OpenAI-compatible.
Structured output: response_format: { type: "json_schema" } support varies.
Streaming chunks: reasoning models like DeepSeek-R1 emit thoughts in delta.reasoning_content.
Multimodal: OpenAI uses image_url; Qwen also accepts image_url; some vendors use proprietary schemas.
Token / context limits: max input/output lengths vary per vendor.

Practical notes#

Externalise LLM_PROVIDER + BASE_URL in config — swapping vendors = env change only.
Don't hard-code model names. In production use aliases (your layer calls it chat-fast, mapping to the actual model).
Rate limits / retries / backoff: vary per vendor; centralise in your client. OpenRouter offers fallback routing.
Cost compare: DeepSeek-V3 / GLM / Qwen mainstream models are 5–20× cheaper than OpenAI at the same tier. Develop on OpenAI, ship on cheaper / open-source.
Compliance: data residency, training-data inclusion — read the terms.
Streaming disconnect: SSE breaks on flaky networks; client must retry + handle idempotency.

Easy confusions#

OpenAI-compatible API

`chat.completions` and basic endpoints.
All vendors target this surface.

OpenAI-only features

Realtime API, Assistants, Files, Fine-tune, batch.
Most third parties don't implement / partially implement these.