Esc to close · ⌘K / Ctrl-K opens search anywhere
The cheapest model, the snappiest model and the fastest-streaming model are usually three different models — and the winner flips with your workload. Pick your use case; we rank the catalog by what that workload actually feels: ₹ per task, time to first token and sustained throughput, measured through the production gateway (sweep of 2026-07-02), not read off datasheets.
Agent loops make dozens of short, tool-calling turns per task — time to first token dominates how fast the agent feels, throughput matters for big diffs, and cost adds up across the loop. Reasoning support is required.
| # | Model | Best route | ₹/Mtok (blended) | First token | tok/s |
|---|---|---|---|---|---|
| 1 | gpt-oss-120b open weights 🇮🇳 India route | fireworks | ₹39 | 1.2s | 469 |
| 2 | qwen3-32b open weights | groq | ₹22 | 2.1s | 382 |
| 3 | gpt-oss-20b open weights 🇮🇳 India route | krutrim | ₹24 | 4.7s | 606 |
| 4 | glm-4.7 open weights | openrouter BYOK | ₹136 | 1.9s | 68 |
| 5 | glm-4.7-flash open weights price-ranked | zhipu BYOK | free | — | — |
| 6 | glm-4.5-flash open weights price-ranked | zhipu BYOK | free | — | — |
| 7 | qwen3.5-9b open weights 🇮🇳 India route price-ranked | krutrim | ₹6.8 | — | — |
| 8 | gemini-2.5-flash | openrouter | ₹187 | 2.2s | 132 |
Blended ₹/Mtok = cheapest route at a 1:3 input:output token mix (generation-dominant). First token and tok/s are the best measured route per model. Rankings are a weighted percentile score per lens — details in the methodology below.
Click a metric column to sort by that lens. The same model often wins one and loses another.
| Model | Routes | ₹/Mtok ↕ | First token ↕ | tok/s ↕ |
|---|---|---|---|---|
| glm-4.7-flash reasoning | zhipuopenrouter | free | — | — |
| glm-4.5-flash reasoning | zhipu | free | — | — |
| llama-3.1-8b-instruct | bharatrouter 🇮🇳openroutergroqfireworks | ₹2.8 | 428ms groq | 548 |
| qwen2.5-coder-7b | bharatrouter 🇮🇳 | ₹3.5 | — | — |
| qwen2.5-7b-instruct | bharatrouter 🇮🇳 | ₹3.5 | — | — |
| qwen2.5-vl-7b-instruct | bharatrouter 🇮🇳 | ₹5.3 | — | — |
| gemma-4-e4b-it | krutrim 🇮🇳 | ₹6.8 | — | — |
| qwen3.5-9b reasoning | krutrim 🇮🇳 | ₹6.8 | — | — |
| glm-4-32b-0414-128k | zhipu | ₹10 | — | — |
| qwen3-32b reasoning | groqopenrouterfireworks | ₹22 | 2.1s groq | 382 |
| gemma-4-26b-a4b-it | krutrim 🇮🇳 | ₹23 | — | — |
| qwen3.6-35b-a3b reasoning | krutrim 🇮🇳 | ₹23 | — | — |
| gpt-oss-20b reasoning | krutrim 🇮🇳groqfireworks | ₹24 | 4.7s krutrim | 606 |
| llama-3.3-70b | groqopenrouterfireworks | ₹26 | 2.0s openrouter | 218 |
| gemma-4-31b-it | krutrim 🇮🇳 | ₹27 | — | — |
| llama-4-scout | groqfireworks | ₹28 | — | — |
| glm-4.7-flashx reasoning | zhipu | ₹30 | — | — |
| gpt-oss-120b reasoning | krutrim 🇮🇳groqbasetenfireworks | ₹39 | 1.2s fireworks | 469 |
| gpt-4o-mini | openai | ₹47 | — | — |
| nemotron-super reasoning | baseten | ₹61 | — | — |
| deepseek-v3 | openrouter | ₹63 | 16.3s openrouter | 38 |
| glm-4.5-air reasoning | zhipuopenrouterfireworks | ₹65 | — | — |
| glm-4.6v reasoning | zhipuopenrouter | ₹72 | — | — |
| glm-4.6 reasoning | zhipuopenrouter | ₹136 | — | — |
| glm-4.7 reasoning | basetenzhipuopenrouter | ₹136 | 1.9s openrouter | 68 |
| qwen3.6-27b reasoning | krutrim 🇮🇳 | ₹147 | — | — |
| gpt-5-mini reasoning | openai | ₹150 | 10.2s openai | 63 |
| glm-5 reasoning | basetenzhipuopenrouter | ₹153 | 12.2s zhipu | 53 |
| kimi-k2.5 reasoning | moonshotbasetenopenrouter | ₹155 | — | — |
| glm-4.5 reasoning | zhipuopenrouter | ₹173 | — | — |
| kimi-k2 | groqopenrouter | ₹180 | 688ms groq | 178 |
| nemotron-ultra reasoning | baseten | ₹187 | — | — |
| gemini-2.5-flash reasoning | openrouter | ₹187 | 2.2s openrouter | 132 |
| glm-5.2 reasoning | basetenzhipuopenrouter | ₹238 | 5.7s zhipu | 68 |
| kimi-k2.6 reasoning | moonshotbasetenopenrouter | ₹244 | — | — |
| kimi-k2.7-code reasoning | moonshotbasetenopenrouter | ₹270 | 19.6s openrouter | 39 |
| deepseek-v4-pro reasoning | basetenfireworks | ₹292 | — | — |
| glm-5-turbo reasoning | zhipu | ₹317 | — | — |
| glm-5v-turbo reasoning | zhipuopenrouter | ₹317 | — | — |
| glm-5.1 reasoning | basetenzhipuopenrouter | ₹333 | — | — |
| glm-4.5-airx reasoning | zhipu | ₹351 | — | — |
| claude-haiku-4.5 reasoning | anthropicopenrouter | ₹384 | 16.8s openrouter | 102 |
| kimi-k2.7-code-highspeed reasoning | moonshot | ₹622 | — | — |
| glm-4.5-x reasoning | zhipu | ₹693 | — | — |
| gpt-5 reasoning | openai | ₹750 | — | — |
| claude-sonnet-5 reasoning | anthropicopenrouter | ₹1152 | — | — |
| claude-opus-4.8 reasoning | anthropicopenrouter | ₹1920 | — | — |
Perf numbers are medians from a multi-run streamed sweep through the production gateway on 2026-07-02: every (model × provider) route gets the same ~300-token prompt, rounds interleaved across hosts so no provider owns a time-of-day advantage. First token counts reasoning tokens (it's what you see). tok/s is the post-first-token decode rate. Models not yet swept show “—” and rank on price with a neutral perf score. Routes we could not measure are listed openly in the API response (9 skipped this sweep), never silently dropped. Numbers refresh with each sweep; live per-route health is on /models.
Agents get this same chooser as JSON:
GET /v1/compare/models — rankings, per-route pricing, measured perf and
live failure rates, no auth required.