Which AI model should I use for a chat assistant?

A user is watching the screen: first token wins or loses the experience. Answers are short-to-medium, so sustained throughput matters less than snap. Top picks on BharatRouter right now: llama-3.1-8b-instruct, gpt-oss-120b, qwen3-32b.

Which AI model should I use for a bulk extraction / classification?

Nobody is watching a batch job — ₹ per task is nearly everything, throughput sets how long the batch takes, and TTFT is irrelevant. Top picks on BharatRouter right now: llama-3.1-8b-instruct, gpt-oss-20b, glm-4.7-flash.

Which AI model should I use for a long-form generation?

Reports, articles, big code files: the generation is thousands of tokens, so sustained decode rate dominates wall-clock, with cost a close second. Top picks on BharatRouter right now: llama-3.1-8b-instruct, gpt-oss-20b, gpt-oss-120b.

Which AI model should I use for a indic-language work?

Candidates are filtered to models with first-class Indic support; among them the tradeoff is balanced — pick by residency needs first, then price. Top picks on BharatRouter right now: llama-3.1-8b-instruct, llama-3.3-70b.

Which AI model should I use for a rag / embeddings?

Embedding calls are high-volume and latency-tolerant inside an indexing pipeline — price per Mtok is effectively the whole decision. Top picks on BharatRouter right now: bge-m3, text-embedding-3-small.

Which AI model should I use for a voice agents (stt/tts)?

Speech quality and language coverage decide first; among viable voices, price per Mtok of text is the comparable number. Top picks on BharatRouter right now: parakeet, whisper-1, whisper-large-v3.

Which model should I use?

Start from your application

Coding agent Chat assistant Bulk extraction / classification Long-form generation Indic-language work RAG / embeddings Voice agents (STT/TTS)

Agent loops make dozens of short, tool-calling turns per task — time to first token dominates how fast the agent feels, throughput matters for big diffs, and cost adds up across the loop. Reasoning support is required.

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	gpt-oss-120b open weights 🇮🇳 India route	fireworks	₹39	1.2s	469
2	qwen3-32b open weights	groq	₹22	2.1s	382
3	gpt-oss-20b open weights 🇮🇳 India route	krutrim	₹24	4.7s	606
4	glm-4.7 open weights	openrouter BYOK	₹136	1.9s	68
5	glm-4.7-flash open weights price-ranked	zhipu BYOK	free	—	—
6	glm-4.5-flash open weights price-ranked	zhipu BYOK	free	—	—
7	qwen3.5-9b open weights 🇮🇳 India route price-ranked	krutrim	₹6.8	—	—
8	gemini-2.5-flash	openrouter	₹187	2.2s	132

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	llama-3.1-8b-instruct open weights 🇮🇳 India route	groq	₹2.8	428ms	548
2	gpt-oss-120b open weights 🇮🇳 India route	fireworks	₹39	1.2s	469
3	qwen3-32b open weights	groq	₹22	2.1s	382
4	llama-3.3-70b open weights	openrouter	₹26	2.0s	218
5	kimi-k2 open weights	groq	₹180	688ms	178
6	gpt-oss-20b open weights 🇮🇳 India route	krutrim	₹24	4.7s	606
7	glm-4.7-flash open weights price-ranked	zhipu BYOK	free	—	—
8	glm-4.5-flash open weights price-ranked	zhipu BYOK	free	—	—

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	llama-3.1-8b-instruct open weights 🇮🇳 India route	groq	₹2.8	428ms	548
2	gpt-oss-20b open weights 🇮🇳 India route	krutrim	₹24	4.7s	606
3	glm-4.7-flash open weights price-ranked	zhipu BYOK	free	—	—
4	glm-4.5-flash open weights price-ranked	zhipu BYOK	free	—	—
5	qwen3-32b open weights	groq	₹22	2.1s	382
6	qwen2.5-coder-7b open weights 🇮🇳 India route price-ranked	bharatrouter	₹3.5	—	—
7	qwen2.5-7b-instruct open weights 🇮🇳 India route price-ranked	bharatrouter	₹3.5	—	—
8	qwen2.5-vl-7b-instruct open weights 🇮🇳 India route price-ranked	bharatrouter	₹5.3	—	—

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	llama-3.1-8b-instruct open weights 🇮🇳 India route	groq	₹2.8	428ms	548
2	gpt-oss-20b open weights 🇮🇳 India route	krutrim	₹24	4.7s	606
3	gpt-oss-120b open weights 🇮🇳 India route	fireworks	₹39	1.2s	469
4	qwen3-32b open weights	groq	₹22	2.1s	382
5	llama-3.3-70b open weights	openrouter	₹26	2.0s	218
6	glm-4.7-flash open weights price-ranked	zhipu BYOK	free	—	—
7	glm-4.5-flash open weights price-ranked	zhipu BYOK	free	—	—
8	qwen2.5-coder-7b open weights 🇮🇳 India route price-ranked	bharatrouter	₹3.5	—	—

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	llama-3.1-8b-instruct open weights 🇮🇳 India route	groq	₹2.8	428ms	548
2	llama-3.3-70b open weights	openrouter	₹26	2.0s	218

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	bge-m3 open weights 🇮🇳 India route price-ranked	bharatrouter	₹0.3	—	—
2	text-embedding-3-small price-ranked	openai	₹0.5	—	—

#	Model	Best route	₹/Mtok (blended)	First token	tok/s
1	parakeet open weights 🇮🇳 India route price-ranked	bharatrouter	free	—	—
2	whisper-1 open weights price-ranked	openai BYOK	free	—	—
3	whisper-large-v3 open weights price-ranked	groq BYOK	free	—	—
4	saarika 🇮🇳 India route price-ranked	sarvam BYOK	free	—	—
5	bulbul 🇮🇳 India route price-ranked	sarvam BYOK	free	—	—
6	eleven-tts price-ranked	elevenlabs BYOK	free	—	—
7	eleven-stt price-ranked	elevenlabs BYOK	free	—	—
8	tts 🇮🇳 India route price-ranked	bharatrouter	free	—	—

Blended ₹/Mtok = cheapest route at a 1:3 input:output token mix (generation-dominant). First token and tok/s are the best measured route per model. Rankings are a weighted percentile score per lens — details in the methodology below.

The full picture — every chat model, three lenses

Click a metric column to sort by that lens. The same model often wins one and loses another.

Model	Routes	₹/Mtok ↕	First token ↕	tok/s ↕
glm-4.7-flash reasoning	zhipuopenrouter	free	—	—
glm-4.5-flash reasoning	zhipu	free	—	—
llama-3.1-8b-instruct	bharatrouter 🇮🇳openroutergroqfireworks	₹2.8	428ms groq	548
qwen2.5-coder-7b	bharatrouter 🇮🇳	₹3.5	—	—
qwen2.5-7b-instruct	bharatrouter 🇮🇳	₹3.5	—	—
qwen2.5-vl-7b-instruct	bharatrouter 🇮🇳	₹5.3	—	—
gemma-4-e4b-it	krutrim 🇮🇳	₹6.8	—	—
qwen3.5-9b reasoning	krutrim 🇮🇳	₹6.8	—	—
glm-4-32b-0414-128k	zhipu	₹10	—	—
qwen3-32b reasoning	groqopenrouterfireworks	₹22	2.1s groq	382
gemma-4-26b-a4b-it	krutrim 🇮🇳	₹23	—	—
qwen3.6-35b-a3b reasoning	krutrim 🇮🇳	₹23	—	—
gpt-oss-20b reasoning	krutrim 🇮🇳groqfireworks	₹24	4.7s krutrim	606
llama-3.3-70b	groqopenrouterfireworks	₹26	2.0s openrouter	218
gemma-4-31b-it	krutrim 🇮🇳	₹27	—	—
llama-4-scout	groqfireworks	₹28	—	—
glm-4.7-flashx reasoning	zhipu	₹30	—	—
gpt-oss-120b reasoning	krutrim 🇮🇳groqbasetenfireworks	₹39	1.2s fireworks	469
gpt-4o-mini	openai	₹47	—	—
nemotron-super reasoning	baseten	₹61	—	—
deepseek-v3	openrouter	₹63	16.3s openrouter	38
glm-4.5-air reasoning	zhipuopenrouterfireworks	₹65	—	—
glm-4.6v reasoning	zhipuopenrouter	₹72	—	—
glm-4.6 reasoning	zhipuopenrouter	₹136	—	—
glm-4.7 reasoning	basetenzhipuopenrouter	₹136	1.9s openrouter	68
qwen3.6-27b reasoning	krutrim 🇮🇳	₹147	—	—
gpt-5-mini reasoning	openai	₹150	10.2s openai	63
glm-5 reasoning	basetenzhipuopenrouter	₹153	12.2s zhipu	53
kimi-k2.5 reasoning	moonshotbasetenopenrouter	₹155	—	—
glm-4.5 reasoning	zhipuopenrouter	₹173	—	—
kimi-k2	groqopenrouter	₹180	688ms groq	178
nemotron-ultra reasoning	baseten	₹187	—	—
gemini-2.5-flash reasoning	openrouter	₹187	2.2s openrouter	132
glm-5.2 reasoning	basetenzhipuopenrouter	₹238	5.7s zhipu	68
kimi-k2.6 reasoning	moonshotbasetenopenrouter	₹244	—	—
kimi-k2.7-code reasoning	moonshotbasetenopenrouter	₹270	19.6s openrouter	39
deepseek-v4-pro reasoning	basetenfireworks	₹292	—	—
glm-5-turbo reasoning	zhipu	₹317	—	—
glm-5v-turbo reasoning	zhipuopenrouter	₹317	—	—
glm-5.1 reasoning	basetenzhipuopenrouter	₹333	—	—
glm-4.5-airx reasoning	zhipu	₹351	—	—
claude-haiku-4.5 reasoning	anthropicopenrouter	₹384	16.8s openrouter	102
kimi-k2.7-code-highspeed reasoning	moonshot	₹622	—	—
glm-4.5-x reasoning	zhipu	₹693	—	—
gpt-5 reasoning	openai	₹750	—	—
claude-sonnet-5 reasoning	anthropicopenrouter	₹1152	—	—
claude-opus-4.8 reasoning	anthropicopenrouter	₹1920	—	—

In the open — how these numbers are made

Perf numbers are medians from a multi-run streamed sweep through the production gateway on 2026-07-02: every (model × provider) route gets the same ~300-token prompt, rounds interleaved across hosts so no provider owns a time-of-day advantage. First token counts reasoning tokens (it's what you see). tok/s is the post-first-token decode rate. Models not yet swept show “—” and rank on price with a neutral perf score. Routes we could not measure are listed openly in the API response (9 skipped this sweep), never silently dropped. Numbers refresh with each sweep; live per-route health is on /models.

Agents get this same chooser as JSON: GET /v1/compare/models — rankings, per-route pricing, measured perf and live failure rates, no auth required.

Get a key Browse the catalog