核心 · Key Idea
In one line: DeepSeek / Qwen / GLM / Kimi are the most-used LLM APIs from China — OpenAI-compatible, 5–20× cheaper than Western counterparts, locally compliant. Each has its own niche: DeepSeek for strong reasoning, Qwen for breadth + open-weights, GLM for balanced bilingual + multimodal, Kimi for very long context.
Vendor cheat sheet#
- DeepSeek
- deepseek-chat (V3 family) + deepseek-reasoner (R1 family). Price/performance king. Full tools / json mode.
- Qwen / Tongyi
- qwen-max / qwen-plus / qwen-flash + open-weights Qwen3 / Qwen-VL / Qwen-Coder. The richest ecosystem.
- Zhipu GLM
- GLM-4.6 (reasoning) / GLM-4.5-Air (lightweight) / GLM-4V (multimodal) / CodeGeeX. Balanced ZH/EN; strong on agent tasks.
- Moonshot Kimi
- k2 / moonshot-v1-* — early king of long context (128k–200k).
- MiniMax
- abab series + video gen. Multimodal-leaning.
- Doubao
- ByteDance, fast, full text + vision lineup.
- Hunyuan / Pangu / Spark
- Tencent / Huawei / iFlytek in-house models.
- SiliconFlow / Volcano Ark / DashScope
- Aggregator services — one account, many vendors.
Analogy#
打个比方 · Analogy
Different APIs are like different food-delivery platforms: menus (models) are similar; prices, delivery speed, promos vary. The same dish (capability) on a different platform may cost half as much.
Code examples#
from openai import OpenAI
# DeepSeek
ds = OpenAI(base_url="https://api.deepseek.com/v1", api_key="sk-...")
ds.chat.completions.create(model="deepseek-chat", messages=[...])
# Reasoning model uses a different model name
ds.chat.completions.create(model="deepseek-reasoner", messages=[...])
# Qwen (compatibility mode)
qw = OpenAI(
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
api_key="sk-...")
qw.chat.completions.create(model="qwen-plus", messages=[...])
# Zhipu GLM
zp = OpenAI(base_url="https://open.bigmodel.cn/api/paas/v4", api_key="...")
zp.chat.completions.create(model="glm-4.6", messages=[...])
# Kimi
km = OpenAI(base_url="https://api.moonshot.cn/v1", api_key="sk-...")
km.chat.completions.create(model="moonshot-v1-32k", messages=[...])Key differences#
Reasoning modeReasoner / Thinking
DeepSeek-R1, GLM-4.6, Qwen3 thinking — `delta` contains an extra `reasoning_content` field.
Context lengthContext
Kimi pioneered 128k; today Qwen3 / GLM / DeepSeek mainstream is 128k–256k.
JSON / ToolsStructured output
Mostly OpenAI-compatible schemas; minor field differences need testing.
MultimodalVision / Audio
Qwen-VL, GLM-4V, Doubao, Kimi-VL all support images; video/audio is catching up fast.
ComplianceICP / data residency
Chinese vendors store data domestically — required for consumer apps shipping in China.
Rate limitsRate limit
Per balance / tier; check docs before scaling.
Choosing#
Prices change frequently; A/B for a week when picking.
Practical notes#
- Gateway abstraction. One wrapper, route per business / task. Model swaps touch one place.
- Local fallback. When the cloud fails, drop to local vLLM / Ollama with the open-weights variant (Qwen / DeepSeek / GLM are all available).
- Hybrid pipelines. Strong reasoner (DeepSeek-R1 / GLM-4.6) plans, small model (qwen-flash) executes high-volume calls.
- Enterprise: Volcano Ark / Aliyun DashScope / Tencent TI offer aggregation + private deployments with SLAs.
- Safety prompts: Chinese models are stricter on political / sensitive topics; avoid triggers in prompts.
- Batch tasks. DeepSeek and Qwen offer batch APIs at half price.
Easy confusions#
API call (online)
No ops, billed per token.
Fastest to launch.
Fastest to launch.
Open-weights (self-hosted)
Deploy Qwen / DeepSeek / GLM yourself.
Data stays on-prem; you manage the GPUs.
Data stays on-prem; you manage the GPUs.