🎬 New — watch the 2-minute guide videos →

Which model should you use?

The cheapest model, the snappiest model and the fastest-streaming model are usually three different models — and the winner flips with your workload. Pick your use case; we rank the catalog by what that workload actually feels: ₹ per task, time to first token and sustained throughput, measured through the production gateway (sweep of 2026-07-02), not read off datasheets.

Start from your application

Agent loops make dozens of short, tool-calling turns per task — time to first token dominates how fast the agent feels, throughput matters for big diffs, and cost adds up across the loop. Reasoning support is required.

#ModelBest route ₹/Mtok (blended) First tokentok/s
1 gpt-oss-120b open weights 🇮🇳 India route fireworks ₹39 1.2s 469
2 qwen3-32b open weights groq ₹22 2.1s 382
3 gpt-oss-20b open weights 🇮🇳 India route krutrim ₹24 4.7s 606
4 glm-4.7 open weights openrouter BYOK ₹136 1.9s 68
5 glm-4.7-flash open weights price-ranked zhipu BYOK free
6 glm-4.5-flash open weights price-ranked zhipu BYOK free
7 qwen3.5-9b open weights 🇮🇳 India route price-ranked krutrim ₹6.8
8 gemini-2.5-flash openrouter ₹187 2.2s 132

Blended ₹/Mtok = cheapest route at a 1:3 input:output token mix (generation-dominant). First token and tok/s are the best measured route per model. Rankings are a weighted percentile score per lens — details in the methodology below.

The full picture — every chat model, three lenses

Click a metric column to sort by that lens. The same model often wins one and loses another.

ModelRoutes ₹/Mtok ↕ First token ↕ tok/s ↕
glm-4.7-flash reasoning zhipuopenrouter free
glm-4.5-flash reasoning zhipu free
llama-3.1-8b-instruct bharatrouter 🇮🇳openroutergroqfireworks ₹2.8 428ms groq 548
qwen2.5-coder-7b bharatrouter 🇮🇳 ₹3.5
qwen2.5-7b-instruct bharatrouter 🇮🇳 ₹3.5
qwen2.5-vl-7b-instruct bharatrouter 🇮🇳 ₹5.3
gemma-4-e4b-it krutrim 🇮🇳 ₹6.8
qwen3.5-9b reasoning krutrim 🇮🇳 ₹6.8
glm-4-32b-0414-128k zhipu ₹10
qwen3-32b reasoning groqopenrouterfireworks ₹22 2.1s groq 382
gemma-4-26b-a4b-it krutrim 🇮🇳 ₹23
qwen3.6-35b-a3b reasoning krutrim 🇮🇳 ₹23
gpt-oss-20b reasoning krutrim 🇮🇳groqfireworks ₹24 4.7s krutrim 606
llama-3.3-70b groqopenrouterfireworks ₹26 2.0s openrouter 218
gemma-4-31b-it krutrim 🇮🇳 ₹27
llama-4-scout groqfireworks ₹28
glm-4.7-flashx reasoning zhipu ₹30
gpt-oss-120b reasoning krutrim 🇮🇳groqbasetenfireworks ₹39 1.2s fireworks 469
gpt-4o-mini openai ₹47
nemotron-super reasoning baseten ₹61
deepseek-v3 openrouter ₹63 16.3s openrouter 38
glm-4.5-air reasoning zhipuopenrouterfireworks ₹65
glm-4.6v reasoning zhipuopenrouter ₹72
glm-4.6 reasoning zhipuopenrouter ₹136
glm-4.7 reasoning basetenzhipuopenrouter ₹136 1.9s openrouter 68
qwen3.6-27b reasoning krutrim 🇮🇳 ₹147
gpt-5-mini reasoning openai ₹150 10.2s openai 63
glm-5 reasoning basetenzhipuopenrouter ₹153 12.2s zhipu 53
kimi-k2.5 reasoning moonshotbasetenopenrouter ₹155
glm-4.5 reasoning zhipuopenrouter ₹173
kimi-k2 groqopenrouter ₹180 688ms groq 178
nemotron-ultra reasoning baseten ₹187
gemini-2.5-flash reasoning openrouter ₹187 2.2s openrouter 132
glm-5.2 reasoning basetenzhipuopenrouter ₹238 5.7s zhipu 68
kimi-k2.6 reasoning moonshotbasetenopenrouter ₹244
kimi-k2.7-code reasoning moonshotbasetenopenrouter ₹270 19.6s openrouter 39
deepseek-v4-pro reasoning basetenfireworks ₹292
glm-5-turbo reasoning zhipu ₹317
glm-5v-turbo reasoning zhipuopenrouter ₹317
glm-5.1 reasoning basetenzhipuopenrouter ₹333
glm-4.5-airx reasoning zhipu ₹351
claude-haiku-4.5 reasoning anthropicopenrouter ₹384 16.8s openrouter 102
kimi-k2.7-code-highspeed reasoning moonshot ₹622
glm-4.5-x reasoning zhipu ₹693
gpt-5 reasoning openai ₹750
claude-sonnet-5 reasoning anthropicopenrouter ₹1152
claude-opus-4.8 reasoning anthropicopenrouter ₹1920

In the open — how these numbers are made

Perf numbers are medians from a multi-run streamed sweep through the production gateway on 2026-07-02: every (model × provider) route gets the same ~300-token prompt, rounds interleaved across hosts so no provider owns a time-of-day advantage. First token counts reasoning tokens (it's what you see). tok/s is the post-first-token decode rate. Models not yet swept show “—” and rank on price with a neutral perf score. Routes we could not measure are listed openly in the API response (9 skipped this sweep), never silently dropped. Numbers refresh with each sweep; live per-route health is on /models.

Agents get this same chooser as JSON: GET /v1/compare/models — rankings, per-route pricing, measured perf and live failure rates, no auth required.

Get a key Browse the catalog