Models

CheapestInference serves three frontier open-source models, all available throughout your reserved time blocks on the active pool — there is no separate full-catalog tier. Rate limits are set at the key level, not per model.

List models

Query the live, authoritative model list:

curl https://api.cheapestinference.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Each model object includes an id, owned_by, and a type field ("chat" or "embedding") so you can filter programmatically:

{
  "id": "kimi-k2.6",
  "object": "model",
  "created": 1677610602,
  "owned_by": "cheapestinference",
  "type": "chat"
}

Available models

Model	Provider	Model ID	Context	Cost basis (in / out per 1M)
Kimi K2.6	Moonshot	`kimi-k2.6`	256K	$0.45 / $2.25
GLM 4.7	Zhipu (Z.ai)	`glm-4.7`	198K	$0.40 / $1.75
MiniMax M2.5	MiniMax	`MiniMax-M2.5`	192K	$0.27 / $0.95

The “cost basis” is our underlying per-token cost from the inference provider, useful for comparison. On a time-block subscription you pay a flat monthly fee (from $39/mo), not per-token charges. See Plans & Limits.

Per-model details:

Kimi K2.6 API — Moonshot’s flagship agentic/coding model
GLM 4.7 API — Zhipu’s coding model
MiniMax M2.5 API — high-value general + coding model

Using models

Specify the model ID in your request:

# OpenAI SDK
response = client.chat.completions.create(
    model="kimi-k2.6",  # or "glm-4.7", "MiniMax-M2.5"
    messages=[{"role": "user", "content": "Hello"}]
)

All three models work through the OpenAI endpoint (/v1/chat/completions) and the Anthropic-compatible endpoint (/anthropic/v1/messages). The API handles format translation automatically.