Skip to content

Models

CheapestInference serves three frontier open-source models, all available throughout your reserved time blocks on the active pool — there is no separate full-catalog tier. Rate limits are set at the key level, not per model.

Query the live, authoritative model list:

Terminal window
curl https://api.cheapestinference.com/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"

Each model object includes an id, owned_by, and a type field ("chat" or "embedding") so you can filter programmatically:

{
"id": "kimi-k2.6",
"object": "model",
"created": 1677610602,
"owned_by": "cheapestinference",
"type": "chat"
}
ModelProviderModel IDContextCost basis (in / out per 1M)
Kimi K2.6Moonshotkimi-k2.6256K$0.45 / $2.25
GLM 4.7Zhipu (Z.ai)glm-4.7198K$0.40 / $1.75
MiniMax M2.5MiniMaxMiniMax-M2.5192K$0.27 / $0.95

The “cost basis” is our underlying per-token cost from the inference provider, useful for comparison. On a time-block subscription you pay a flat monthly fee (from $39/mo), not per-token charges. See Plans & Limits.

Per-model details:

Specify the model ID in your request:

# OpenAI SDK
response = client.chat.completions.create(
model="kimi-k2.6", # or "glm-4.7", "MiniMax-M2.5"
messages=[{"role": "user", "content": "Hello"}]
)

All three models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.