Chinese API Providers (DeepSeek / Qwen / GLM / Kimi)

核心 · Key Idea

In one line: DeepSeek / Qwen / GLM / Kimi are the most-used LLM APIs from China — OpenAI-compatible, 5–20× cheaper than Western counterparts, locally compliant. Each has its own niche: DeepSeek for strong reasoning, Qwen for breadth + open-weights, GLM for balanced bilingual + multimodal, Kimi for very long context.

Vendor cheat sheet#

DeepSeek: deepseek-chat (V3 family) + deepseek-reasoner (R1 family). Price/performance king. Full tools / json mode.
Qwen / Tongyi: qwen-max / qwen-plus / qwen-flash + open-weights Qwen3 / Qwen-VL / Qwen-Coder. The richest ecosystem.
Zhipu GLM: GLM-4.6 (reasoning) / GLM-4.5-Air (lightweight) / GLM-4V (multimodal) / CodeGeeX. Balanced ZH/EN; strong on agent tasks.
Moonshot Kimi: k2 / moonshot-v1-* — early king of long context (128k–200k).
MiniMax: abab series + video gen. Multimodal-leaning.
Doubao: ByteDance, fast, full text + vision lineup.
Hunyuan / Pangu / Spark: Tencent / Huawei / iFlytek in-house models.
SiliconFlow / Volcano Ark / DashScope: Aggregator services — one account, many vendors.

Analogy#

打个比方 · Analogy

Different APIs are like different food-delivery platforms: menus (models) are similar; prices, delivery speed, promos vary. The same dish (capability) on a different platform may cost half as much.

Code examples#

from openai import OpenAI
 
# DeepSeek
ds = OpenAI(base_url="https://api.deepseek.com/v1", api_key="sk-...")
ds.chat.completions.create(model="deepseek-chat", messages=[...])
# Reasoning model uses a different model name
ds.chat.completions.create(model="deepseek-reasoner", messages=[...])
 
# Qwen (compatibility mode)
qw = OpenAI(
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    api_key="sk-...")
qw.chat.completions.create(model="qwen-plus", messages=[...])
 
# Zhipu GLM
zp = OpenAI(base_url="https://open.bigmodel.cn/api/paas/v4", api_key="...")
zp.chat.completions.create(model="glm-4.6", messages=[...])
 
# Kimi
km = OpenAI(base_url="https://api.moonshot.cn/v1", api_key="sk-...")
km.chat.completions.create(model="moonshot-v1-32k", messages=[...])

Key differences#

Reasoning modeReasoner / Thinking

DeepSeek-R1, GLM-4.6, Qwen3 thinking — `delta` contains an extra `reasoning_content` field.

Context lengthContext

Kimi pioneered 128k; today Qwen3 / GLM / DeepSeek mainstream is 128k–256k.

JSON / ToolsStructured output

Mostly OpenAI-compatible schemas; minor field differences need testing.

MultimodalVision / Audio

Qwen-VL, GLM-4V, Doubao, Kimi-VL all support images; video/audio is catching up fast.

ComplianceICP / data residency

Chinese vendors store data domestically — required for consumer apps shipping in China.

Rate limitsRate limit

Per balance / tier; check docs before scaling.

Choosing#

Prices change frequently; A/B for a week when picking.

Practical notes#

Gateway abstraction. One wrapper, route per business / task. Model swaps touch one place.
Local fallback. When the cloud fails, drop to local vLLM / Ollama with the open-weights variant (Qwen / DeepSeek / GLM are all available).
Hybrid pipelines. Strong reasoner (DeepSeek-R1 / GLM-4.6) plans, small model (qwen-flash) executes high-volume calls.
Enterprise: Volcano Ark / Aliyun DashScope / Tencent TI offer aggregation + private deployments with SLAs.
Safety prompts: Chinese models are stricter on political / sensitive topics; avoid triggers in prompts.
Batch tasks. DeepSeek and Qwen offer batch APIs at half price.

Easy confusions#

API call (online)

No ops, billed per token.
Fastest to launch.

Open-weights (self-hosted)

Deploy Qwen / DeepSeek / GLM yourself.
Data stays on-prem; you manage the GPUs.