AI Model Ranking (LLM Leaderboard)

Fastest AI Models

Language models ranked by inference speed and throughput

Model AI model name and provider organization	Price/1M Cost per 1 million tokens — input (text you send) / output (text the model generates)	MMLU-Pro Massive Multitask Language Understanding (Professional) - tests broad knowledge across 14 subjects including STEM, humanities, and social sciences	Speed Inference throughput in tokens per second - how fast the model generates responses	GPQA Graduate-level Google-Proof Q&A benchmark - tests PhD-level reasoning and advanced intelligence	AIME 2025 American Invitational Mathematics Examination 2025 - tests advanced mathematical problem-solving ability	Release When the model was released - newer models may have more capabilities	Compare
#1 Mercury 2 by Inception	$0.25 / $0.75	-	953 tok/s	77.0%	-	Feb 20, 2026	Chat now
#2 Granite 3.3 8B (Non-reasoning) by IBM	$0.03 / $0.25	46.8%	484 tok/s	33.8%	6.7%	Apr 16, 2025	Chat now
#3 Granite 4.0 H Small by IBM	$0.06 / $0.25	62.4%	442 tok/s	41.6%	13.7%	Sep 22, 2025	Chat now
#4 Qwen3.5 0.8B (Non-reasoning) by Alibaba	$0.01 / $0.05	-	319 tok/s	23.6%	-	Mar 2, 2026	Chat now
#5 Gemini 2.5 Flash-Lite (Non-reasoning) by Google	$0.10 / $0.40	72.4%	304 tok/s	47.4%	35.3%	Jun 17, 2025	Chat now
#6 Ministral 3 3B by Mistral	$0.10 / $0.10	52.4%	302 tok/s	35.8%	22.0%	Dec 2, 2025	Chat now
#7 Gemini 2.5 Flash-Lite (Reasoning) by Google	$0.10 / $0.40	75.9%	295 tok/s	62.5%	53.3%	Jun 17, 2025	Chat now
#8 Nova Micro by Amazon	$0.04 / $0.14	53.1%	287 tok/s	35.8%	6.0%	Dec 3, 2024	Chat now
#9 Qwen3.5 2B (Non-reasoning) by Alibaba	$0.02 / $0.10	-	283 tok/s	43.8%	-	Mar 2, 2026	Chat now
#10 Step 3.5 Flash by StepFun	$0.10 / $0.30	-	256 tok/s	83.1%	-	Feb 2, 2026	Chat now
#11 gpt-oss-120B (low) by OpenAI	$0.15 / $0.60	77.5%	248 tok/s	67.2%	66.7%	Aug 5, 2025	Chat now
#12 gpt-oss-20B (high) by OpenAI	$0.06 / $0.20	74.8%	247 tok/s	68.8%	89.3%	Aug 5, 2025	Chat now
#13 gpt-oss-120B (high) by OpenAI	$0.15 / $0.60	80.8%	234 tok/s	78.2%	93.4%	Aug 5, 2025	Chat now
#14 Qwen3.5 4B (Reasoning) by Alibaba	$0.03 / $0.15	-	233 tok/s	77.1%	-	Mar 2, 2026	Chat now
#15 Qwen3.5 4B (Non-reasoning) by Alibaba	$0.03 / $0.15	-	230 tok/s	71.2%	-	Mar 2, 2026	Chat now
#16 gpt-oss-20B (low) by OpenAI	$0.06 / $0.20	71.8%	215 tok/s	61.1%	62.3%	Aug 5, 2025	Chat now
#17 Grok 4 Fast (Non-reasoning) by xAI	$0.20 / $0.50	73.0%	213 tok/s	60.6%	41.3%	Sep 19, 2025	Chat now
#18 Nova 2.0 Omni (Non-reasoning) by Amazon	$0.30 / $2.50	71.9%	212 tok/s	55.5%	37.0%	Nov 26, 2025	Chat now
#19 Devstral Small (Jul '25) by Mistral	$0.10 / $0.30	62.2%	211 tok/s	41.4%	29.3%	Jul 10, 2025	Chat now
#20 Nova 2.0 Lite (low) by Amazon	$0.30 / $2.50	78.8%	202 tok/s	69.8%	46.7%	Oct 29, 2025	Chat now
#21 Gemini 3 Flash Preview (Non-reasoning) by Google	$0.50 / $3.00	88.2%	201 tok/s	81.2%	55.7%	Dec 17, 2025	Chat now
#22 Gemini 3.1 Flash-Lite Preview by Google	$0.25 / $1.50	-	197 tok/s	82.2%	-	Mar 3, 2026	Chat now
#23 Grok 4.20 0309 (Reasoning) by xAI	$2.00 / $6.00	-	197 tok/s	88.5%	-	Mar 10, 2026	Chat now
#24 Ministral 3 8B by Mistral	$0.15 / $0.15	64.2%	196 tok/s	47.1%	31.7%	Dec 2, 2025	Chat now
#25 Mistral 7B Instruct by Mistral	$0.25 / $0.25	24.5%	196 tok/s	17.7%	-	Sep 27, 2023	Chat now
#26 Grok 3 mini Reasoning (high) by xAI	$0.30 / $0.50	82.8%	196 tok/s	79.1%	84.7%	Feb 19, 2025	Chat now
#27 GPT-5.1 Codex (high) by OpenAI	$1.25 / $10.00	86.0%	196 tok/s	86.0%	95.7%	Nov 13, 2025	Chat now
#28 Gemini 3 Flash Preview (Reasoning) by Google	$0.50 / $3.00	89.0%	195 tok/s	89.8%	97.0%	Dec 17, 2025	Chat now
#29 Grok 4.20 0309 v2 (Reasoning) by xAI	$2.00 / $6.00	-	194 tok/s	91.1%	-	Apr 7, 2026	Chat now
#30 Gemini 2.5 Flash (Reasoning) by Google	$0.30 / $2.50	83.2%	193 tok/s	79.0%	73.3%	May 20, 2025	Chat now
#31 Qwen3 0.6B (Reasoning) by Alibaba	$0.11 / $1.26	34.7%	192 tok/s	23.9%	18.0%	Apr 28, 2025	Chat now
#32 Nova Lite by Amazon	$0.06 / $0.24	59.0%	192 tok/s	43.3%	7.0%	Dec 3, 2024	Chat now
#33 Nova 2.0 Lite (Non-reasoning) by Amazon	$0.30 / $2.50	74.3%	190 tok/s	60.3%	33.7%	Oct 29, 2025	Chat now
#34 Qwen3 0.6B (Non-reasoning) by Alibaba	$0.11 / $0.42	23.1%	190 tok/s	23.1%	10.3%	Apr 28, 2025	Chat now
#35 GPT-5.1 Codex mini (high) by OpenAI	$0.25 / $2.00	82.0%	189 tok/s	81.3%	91.7%	Nov 13, 2025	Chat now
#36 Gemini 2.5 Flash (Non-reasoning) by Google	$0.30 / $2.50	80.9%	189 tok/s	68.3%	60.3%	May 20, 2025	Chat now
#37 Nova 2.0 Lite (medium) by Amazon	$0.30 / $2.50	81.3%	189 tok/s	76.8%	88.7%	Oct 29, 2025	Chat now
#38 Nova 2.0 Lite (high) by Amazon	$0.30 / $2.50	81.8%	188 tok/s	81.1%	94.3%	Oct 29, 2025	Chat now
#39 GPT-4.1 nano by OpenAI	$0.10 / $0.40	65.7%	188 tok/s	51.2%	24.0%	Apr 14, 2025	Chat now
#40 GPT-5 Codex (high) by OpenAI	$1.25 / $10.00	86.5%	186 tok/s	83.7%	98.7%	Sep 23, 2025	Chat now
#41 Jamba 1.6 Mini by AI21 Labs	$0.20 / $0.40	36.7%	186 tok/s	30.0%	-	Mar 6, 2025	Chat now
#42 Grok 4 Fast (Reasoning) by xAI	$0.20 / $0.50	85.0%	185 tok/s	84.7%	89.7%	Sep 19, 2025	Chat now
#43 Grok 4.20 0309 v2 (Non-reasoning) by xAI	$2.00 / $6.00	-	185 tok/s	77.6%	-	N/A	Chat now
#44 Llama 3.1 Instruct 8B by Meta	$0.10 / $0.10	47.6%	184 tok/s	25.9%	4.3%	Jul 23, 2024	Chat now
#45 Magistral Small 1.2 by Mistral	$0.50 / $1.50	76.8%	179 tok/s	66.3%	80.3%	Sep 17, 2025	Chat now
#46 GPT-5.4 nano (xhigh) by OpenAI	$0.20 / $1.25	-	178 tok/s	81.7%	-	Mar 17, 2026	Chat now
#47 Qwen3.5 9B (Non-reasoning) by Alibaba	$0.04 / $0.20	-	178 tok/s	78.6%	-	Mar 2, 2026	Chat now
#48 GPT-5.4 nano (medium) by OpenAI	$0.20 / $1.25	-	177 tok/s	76.1%	-	Mar 17, 2026	Chat now
#49 GPT-5.4 nano (Non-Reasoning) by OpenAI	$0.20 / $1.25	-	176 tok/s	55.8%	-	Mar 17, 2026	Chat now
#50 Qwen3 Next 80B A3B (Reasoning) by Alibaba	$0.50 / $6.00	82.4%	175 tok/s	75.9%	84.3%	Sep 11, 2025	Chat now
#51 Qwen3 Next 80B A3B Instruct by Alibaba	$0.50 / $2.00	81.9%	173 tok/s	73.8%	66.3%	Sep 11, 2025	Chat now
#52 Mistral Small 4 (Reasoning) by Mistral	$0.15 / $0.60	-	172 tok/s	76.9%	-	Mar 16, 2026	Chat now
#53 Grok 4.20 0309 (Non-reasoning) by xAI	$2.00 / $6.00	-	172 tok/s	78.5%	-	Mar 10, 2026	Chat now
#54 GPT-5.4 mini (medium) by OpenAI	$0.75 / $4.50	-	170 tok/s	82.3%	-	Mar 17, 2026	Chat now
#55 Qwen3.5 Omni Flash by Alibaba	$0.10 / $0.80	-	168 tok/s	74.2%	-	Mar 30, 2026	Chat now
#56 GPT-5.4 mini (xhigh) by OpenAI	$0.75 / $4.50	-	168 tok/s	87.5%	-	Mar 17, 2026	Chat now
#57 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) by NVIDIA	$0.05 / $0.20	73.9%	165 tok/s	55.7%	62.3%	Aug 18, 2025	Chat now
#58 GPT-5 (ChatGPT) by OpenAI	$1.25 / $10.00	82.0%	165 tok/s	68.6%	48.3%	Aug 7, 2025	Chat now
#59 Grok Code Fast 1 by xAI	$0.20 / $1.50	79.3%	162 tok/s	72.7%	43.3%	Aug 28, 2025	Chat now
#60 Qwen3 Coder Next by Alibaba	$0.35 / $1.20	-	160 tok/s	73.7%	-	Feb 3, 2026	Chat now
#61 GPT-5.4 mini (Non-Reasoning) by OpenAI	$0.75 / $4.50	-	160 tok/s	60.6%	-	Mar 17, 2026	Chat now
#62 Mistral Small (Sep '24) by Mistral	$0.20 / $0.60	52.9%	159 tok/s	38.1%	-	Sep 17, 2024	Chat now
#63 Mistral Small 3.1 by Mistral	$0.10 / $0.30	65.9%	158 tok/s	45.4%	3.7%	Mar 17, 2025	Chat now
#64 Mistral Small 3 by Mistral	$0.10 / $0.30	65.2%	158 tok/s	46.2%	4.3%	Jan 30, 2025	Chat now
#65 Mistral Small (Feb '24) by Mistral	$1.00 / $3.00	41.9%	157 tok/s	30.2%	-	Feb 26, 2024	Chat now
#66 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) by NVIDIA	$0.30 / $0.75	-	157 tok/s	80.0%	-	Mar 11, 2026	Chat now
#67 Mistral Small 4 (Non-reasoning) by Mistral	$0.15 / $0.60	-	156 tok/s	57.1%	-	Mar 16, 2026	Chat now
#68 Llama 3.2 Instruct 1B by Meta	$0.10 / $0.10	20.0%	155 tok/s	19.6%	-	Sep 25, 2024	Chat now
#69 o3-mini by OpenAI	$1.10 / $4.40	79.1%	153 tok/s	74.8%	-	Jan 31, 2025	Chat now
#70 Qwen3.5 122B A10B (Non-reasoning) by Alibaba	$0.40 / $3.20	-	152 tok/s	82.7%	-	Feb 24, 2026	Chat now
#71 Qwen3 30B A3B 2507 (Reasoning) by Alibaba	$0.20 / $2.40	80.5%	151 tok/s	70.7%	56.3%	Jul 30, 2025	Chat now
#72 o3-mini (high) by OpenAI	$1.10 / $4.40	80.2%	147 tok/s	77.3%	-	Jan 31, 2025	Chat now
#73 Nova 2.0 Pro Preview (Non-reasoning) by Amazon	$1.25 / $10.00	77.2%	146 tok/s	63.6%	30.7%	Nov 27, 2025	Chat now
#74 Qwen3.5 35B A3B (Reasoning) by Alibaba	$0.25 / $2.00	-	145 tok/s	84.5%	-	Feb 24, 2026	Chat now
#75 Qwen3 VL 8B Instruct by Alibaba	$0.18 / $0.70	68.6%	145 tok/s	42.7%	27.3%	Oct 14, 2025	Chat now
#76 Claude 4.5 Haiku (Reasoning) by Anthropic	$1.00 / $5.00	76.0%	145 tok/s	67.2%	83.7%	Oct 15, 2025	Chat now
#77 Qwen3.5 122B A10B (Reasoning) by Alibaba	$0.40 / $3.20	-	144 tok/s	85.7%	-	Feb 24, 2026	Chat now
#78 Llama 4 Scout by Meta	$0.17 / $0.66	75.2%	143 tok/s	58.7%	14.0%	Apr 5, 2025	Chat now
#79 GPT-5 nano (medium) by OpenAI	$0.05 / $0.40	77.2%	142 tok/s	67.0%	78.3%	Aug 7, 2025	Chat now
#80 Devstral Medium by Mistral	$0.40 / $2.00	70.8%	142 tok/s	49.2%	4.7%	Jul 10, 2025	Chat now
#81 Qwen3.5 35B A3B (Non-reasoning) by Alibaba	$0.25 / $2.00	-	142 tok/s	81.9%	-	Feb 24, 2026	Chat now
#82 GPT-5 nano (high) by OpenAI	$0.05 / $0.40	78.0%	141 tok/s	67.6%	83.7%	Aug 7, 2025	Chat now
#83 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) by NVIDIA	$0.20 / $0.60	64.9%	141 tok/s	43.9%	26.7%	Oct 28, 2025	Chat now
#84 Nova 2.0 Pro Preview (low) by Amazon	$1.25 / $10.00	82.2%	141 tok/s	75.1%	63.3%	Nov 27, 2025	Chat now
#85 Sarvam 105B (high) by Sarvam	N/A / N/A	-	140 tok/s	73.8%	-	Mar 6, 2026	Chat now
#86 Qwen3 1.7B (Non-reasoning) by Alibaba	$0.11 / $0.42	41.1%	140 tok/s	28.3%	7.3%	Apr 28, 2025	Chat now
#87 Gemini 3 Pro Preview (high) by Google	$2.00 / $12.00	89.8%	139 tok/s	90.8%	95.7%	Nov 18, 2025	Chat now
#88 Qwen3 1.7B (Reasoning) by Alibaba	$0.11 / $1.26	57.0%	139 tok/s	35.6%	38.7%	Apr 28, 2025	Chat now
#89 GPT-5 nano (minimal) by OpenAI	$0.05 / $0.40	55.6%	138 tok/s	42.8%	27.3%	Aug 7, 2025	Chat now
#90 Qwen3 VL 8B (Reasoning) by Alibaba	$0.18 / $2.10	74.9%	138 tok/s	57.9%	30.7%	Oct 14, 2025	Chat now
#91 Nova 2.0 Pro Preview (medium) by Amazon	$1.25 / $10.00	83.0%	137 tok/s	78.5%	89.0%	Nov 27, 2025	Chat now
#92 o4-mini (high) by OpenAI	$1.10 / $4.40	83.2%	137 tok/s	78.4%	90.7%	Apr 16, 2025	Chat now
#93 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) by NVIDIA	$0.20 / $0.60	75.9%	136 tok/s	57.2%	75.0%	Oct 28, 2025	Chat now
#94 Mistral Small 3.2 by Mistral	$0.10 / $0.30	68.1%	134 tok/s	50.5%	27.0%	Jun 20, 2025	Chat now
#95 Claude 3 Haiku by Anthropic	$0.25 / $1.25	-	130 tok/s	37.4%	-	Mar 4, 2024	Chat now
#96 Gemini 3.1 Pro Preview by Google	$2.00 / $12.00	-	128 tok/s	94.1%	-	Feb 19, 2026	Chat now
#97 Ministral 3 14B by Mistral	$0.20 / $0.20	69.3%	128 tok/s	57.2%	30.0%	Dec 2, 2025	Chat now
#98 Qwen3 VL 30B A3B (Reasoning) by Alibaba	$0.20 / $2.40	80.7%	128 tok/s	72.0%	82.3%	Oct 3, 2025	Chat now
#99 MiMo-V2-Flash (Reasoning) by Xiaomi	$0.10 / $0.30	84.3%	127 tok/s	84.6%	96.3%	Dec 16, 2025	Chat now
#100 Qwen3 VL 30B A3B Instruct by Alibaba	$0.20 / $0.80	76.4%	127 tok/s	69.5%	72.3%	Oct 3, 2025	Chat now

Showing 100 of 474 models

Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Start for free View pricing

Understanding the AI Model Leaderboard

This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.

Core AI Benchmarks Explained

MMLU-Pro Tests broad knowledge across 14 academic subjects

GPQA PhD-level reasoning & problem-solving

AIME 2025 Elite mathematical reasoning

Coding Index LiveCodeBench + SciCode composite

Math Index AIME + MATH-500 composite

Key Metrics to Consider

Token Pricing Input vs output cost per 1M tokens

Inference Speed Tokens/sec for response time

Release Date Latest techniques & knowledge

Benchmark Scores 0-100% capability comparison

How to Choose the Right AI Model for Your Use Case

For Research & Analysis

Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation

For Cost Optimization

Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks

For Math & STEM

Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications

All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.

Frequently Asked Questions

What is MMLU-Pro and why is it the standard AI intelligence benchmark?

MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.

What does GPQA measure and which models score highest?

GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.

What is AIME 2025 and how does it evaluate AI mathematical ability?

AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.

How is AI model pricing calculated and what's considered cost-effective?

AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.

Which AI models are best for coding and programming tasks?

Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.

How often are AI model benchmarks and rankings updated?

Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.

What inference speed (tokens/second) do I need for my application?

Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.

Can I test these AI models for free before committing?

Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.