Models
CheapestInference serves three frontier open-source models, all available throughout your reserved time blocks on the active pool — there is no separate full-catalog tier. Rate limits are set at the key level, not per model.
List models
Section titled “List models”Query the live, authoritative model list:
curl https://api.cheapestinference.com/v1/models \ -H "Authorization: Bearer YOUR_API_KEY"Each model object includes an id, owned_by, and a type field ("chat" or "embedding") so you can filter programmatically:
{ "id": "kimi-k2.6", "object": "model", "created": 1677610602, "owned_by": "cheapestinference", "type": "chat"}Available models
Section titled “Available models”| Model | Provider | Model ID | Context | Cost basis (in / out per 1M) |
|---|---|---|---|---|
| Kimi K2.6 | Moonshot | kimi-k2.6 | 256K | $0.45 / $2.25 |
| GLM 4.7 | Zhipu (Z.ai) | glm-4.7 | 198K | $0.40 / $1.75 |
| MiniMax M2.5 | MiniMax | MiniMax-M2.5 | 192K | $0.27 / $0.95 |
The “cost basis” is our underlying per-token cost from the inference provider, useful for comparison. On a time-block subscription you pay a flat monthly fee (from $39/mo), not per-token charges. See Plans & Limits.
Per-model details:
- Kimi K2.6 API — Moonshot’s flagship agentic/coding model
- GLM 4.7 API — Zhipu’s coding model
- MiniMax M2.5 API — high-value general + coding model
Using models
Section titled “Using models”Specify the model ID in your request:
# OpenAI SDKresponse = client.chat.completions.create( model="kimi-k2.6", # or "glm-4.7", "MiniMax-M2.5" messages=[{"role": "user", "content": "Hello"}])All three models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.