AI Model Ranking (LLM Leaderboard)
Fastest AI Models
Language models ranked by Artificial Analysis Index
Best Picks
Start with these models
Quick recommendations from the current benchmark, speed, and pricing data.
Best Overall
Claude Opus 4.8 (Adaptive Reasoning, Max Effort)
by Anthropic
61.4 Highest intelligence score
Best Value
Qwen3.5 4B (Reasoning)
by Alibaba
100 Value score Β· $0.06 blended
Best for Coding
GPT-5.5 (xhigh)
by OpenAI
59.1 Highest coding index
Fastest Usable
Step 3.7 Flash
by StepFun
392 tok/s Fastest model with solid intelligence
| Model AI model name and provider organization | Intelligence Artificial Analysis Intelligence Index - composite reasoning and capability score across the benchmark suite | Value Quality, speed, and blended token price combined into a relative value score | Speed Inference throughput in tokens per second - how fast the model generates responses | Context Maximum context window size - how much text, code, or conversation the model can process at once | Price Cost per 1 million tokens β input (text you send) / output (text the model generates) | Release When the model was released - newer models may have more capabilities | Compare |
|---|---|---|---|---|---|---|---|
| #1 Mercury 2 by Inception | 32.8 | 48 | 855 tok/s | 128K | $0.25 / $0.75 | Feb 20, 2026 | |
| #2 Granite 4.0 H Small by IBM | 10.8 | 30 | 418 tok/s | N/A | $0.06 / $0.25 | Sep 22, 2025 | |
| #3 Granite 3.3 8B (Non-reasoning) by IBM | 7.0 | 22 | 401 tok/s | N/A | $0.03 / $0.25 | Apr 16, 2025 | |
| #4 Step 3.7 Flash by StepFun | 42.6 | 58 | 392 tok/s | 256K | $0.20 / $1.15 | May 29, 2026 | |
| #5 gpt-oss-120b (low) by OpenAI | 24.5 | 43 | 368 tok/s | 131K | $0.15 / $0.60 | Aug 5, 2025 | |
| | |||||||
| #6 gpt-oss-120b (high) by OpenAI | 33.3 | 59 | 367 tok/s | 131K | $0.15 / $0.60 | Aug 5, 2025 | |
| #7 Qwen3.5 2B (Non-reasoning) by Alibaba | 14.7 | 66 | 324 tok/s | N/A | $0.02 / $0.10 | Mar 2, 2026 | |
| #8 Nova Micro by Amazon | 10.3 | 38 | 304 tok/s | N/A | $0.04 / $0.14 | Dec 3, 2024 | |
| #9 Llama 3.1 Nemotron Instruct 70B by NVIDIA | 13.4 | 11 | 285 tok/s | N/A | $1.20 / $1.20 | Oct 15, 2024 | |
| #10 Nemotron 3 Nano Omni 30B A3B Reasoning by NVIDIA | 21.4 | 53 | 285 tok/s | N/A | $0.07 / $0.30 | Apr 29, 2026 | |
| #11 Gemini 3.1 Flash-Lite by Google | 33.5 | 40 | 274 tok/s | 1.0M | $0.25 / $1.50 | Mar 3, 2026 | |
| #12 gpt-oss-20B (low) by OpenAI | 20.8 | 61 | 273 tok/s | 131K | $0.06 / $0.20 | Aug 5, 2025 | |
| #13 gpt-oss-20B (high) by OpenAI | 24.5 | 75 | 266 tok/s | 131K | $0.05 / $0.20 | Aug 5, 2025 | |
| #14 Gemini 2.5 Flash-Lite (Reasoning) by Google | 17.6 | 38 | 265 tok/s | 1.0M | $0.10 / $0.40 | Jun 17, 2025 | |
| #15 Step 3.5 Flash 2603 by StepFun | 38.5 | 90 | 231 tok/s | 262K | $0.10 / $0.30 | Apr 2, 2026 | |
| #16 Gemini 2.5 Flash-Lite (Non-reasoning) by Google | 12.7 | 27 | 230 tok/s | 1.0M | $0.10 / $0.40 | Jun 17, 2025 | |
| #17 Qwen3.5 Omni Flash by Alibaba | 25.9 | 45 | 224 tok/s | N/A | $0.10 / $0.80 | Mar 30, 2026 | |
| #18 Nemotron Nano 12B v2 VL (Non-reasoning) by NVIDIA | 10.1 | 17 | 224 tok/s | N/A | $0.20 / $0.60 | Oct 28, 2025 | |
| #19 Step 3.5 Flash by StepFun | 37.8 | 88 | 220 tok/s | 262K | $0.10 / $0.30 | Feb 2, 2026 | |
| #20 Gemini 2.5 Flash (Reasoning) by Google | 27.0 | 26 | 219 tok/s | 1.0M | $0.30 / $2.50 | May 20, 2025 | |
| #21 o3-mini (high) by OpenAI | 25.2 | 16 | 217 tok/s | 200K | $1.10 / $4.40 | Jan 31, 2025 | |
| #22 Command A+ by Cohere | 37.2 | N/A | 212 tok/s | 256K | N/A / N/A | May 20, 2026 | |
| #23 GPT-5.1 Codex mini (high) by OpenAI | 38.6 | 42 | 211 tok/s | 400K | $0.25 / $2.00 | Nov 13, 2025 | |
| #24 Nova 2.0 Lite (Non-reasoning) by Amazon | 18.0 | 18 | 211 tok/s | N/A | $0.30 / $2.50 | Oct 29, 2025 | |
| #25 Qwen3.5 4B (Non-reasoning) by Alibaba | 22.6 | 83 | 209 tok/s | N/A | $0.03 / $0.15 | Mar 2, 2026 | |
| #26 Gemini 3.5 Flash (medium) by Google | 54.8 | 27 | 206 tok/s | 1.0M | $1.50 / $9.00 | May 19, 2026 | |
| #27 Ministral 3 3B by Mistral | 11.2 | 32 | 203 tok/s | N/A | $0.10 / $0.10 | Dec 2, 2025 | |
| #28 M2.5 by MiniMax | 41.9 | 52 | 203 tok/s | 205K | $0.30 / $1.20 | Feb 12, 2026 | |
| #29 Llama 3.1 Instruct 8B by Meta | 11.8 | 34 | 201 tok/s | N/A | $0.10 / $0.10 | Jul 23, 2024 | |
| #30 o3-mini by OpenAI | 25.9 | 17 | 200 tok/s | 200K | $1.10 / $4.40 | Jan 31, 2025 | |
| #31 Gemini 3.5 Flash (high) by Google | 55.3 | 27 | 198 tok/s | 1.0M | $1.50 / $9.00 | May 19, 2026 | |
| #32 Qwen3.5 4B (Reasoning) by Alibaba | 27.1 | 100 | 197 tok/s | N/A | $0.03 / $0.15 | Mar 2, 2026 | |
| #33 Nova Lite by Amazon | 12.7 | 35 | 192 tok/s | N/A | $0.06 / $0.24 | Dec 3, 2024 | |
| #34 Gemini 3.5 Flash (minimal) by Google | 43.3 | 21 | 188 tok/s | 1.0M | $1.50 / $9.00 | May 19, 2026 | |
| #35 Jamba 1.6 Mini by AI21 Labs | 7.9 | 14 | 186 tok/s | N/A | $0.20 / $0.40 | Mar 6, 2025 | |
| #36 Gemini 2.5 Flash (Non-reasoning) by Google | 20.6 | 20 | 185 tok/s | 1.0M | $0.30 / $2.50 | May 20, 2025 | |
| #37 GPT-5.1 Codex (high) by OpenAI | 43.1 | 21 | 182 tok/s | 400K | $1.25 / $10.00 | Nov 13, 2025 | |
| #38 Gemini 3 Flash Preview (Non-reasoning) by Google | 35.0 | 30 | 181 tok/s | 1.0M | $0.50 / $3.00 | Dec 17, 2025 | |
| #39 M2.1 by MiniMax | 39.4 | 49 | 179 tok/s | 205K | $0.30 / $1.20 | Dec 23, 2025 | |
| #40 Qwen3.7 Max by Alibaba | 56.6 | 26 | 179 tok/s | 1.0M | $2.50 / $7.50 | May 19, 2026 | |
| #41 Gemini 3 Flash Preview (Reasoning) by Google | 46.4 | 40 | 175 tok/s | 1.0M | $0.50 / $3.00 | Dec 17, 2025 | |
| #42 Small 4 (Reasoning) by Mistral | 27.8 | 49 | 172 tok/s | N/A | $0.15 / $0.60 | Mar 16, 2026 | |
| #43 GPT-5 Codex (high) by OpenAI | 44.6 | 22 | 171 tok/s | 400K | $1.25 / $10.00 | Sep 23, 2025 | |
| #44 Grok 4.20 0309 v2 (Reasoning) by xAI | 49.3 | 26 | 171 tok/s | N/A | $2.00 / $6.00 | Apr 7, 2026 | |
| #45 GPT-5 (ChatGPT) by OpenAI | 21.8 | 11 | 169 tok/s | 400K | $1.25 / $10.00 | Aug 7, 2025 | |
| #46 Trinity Large Thinking by Arcee AI | 31.9 | 46 | 169 tok/s | 262K | $0.23 / $0.88 | Apr 1, 2026 | |
| #47 Grok 4.20 0309 (Reasoning) by xAI | 48.5 | 25 | 166 tok/s | N/A | $2.00 / $6.00 | Mar 10, 2026 | |
| #48 Nova 2.0 Lite (low) by Amazon | 24.6 | 24 | 165 tok/s | N/A | $0.30 / $2.50 | Oct 29, 2025 | |
| #49 GPT-5.4 nano (xhigh) by OpenAI | 44.0 | 58 | 165 tok/s | 400K | $0.20 / $1.25 | Mar 17, 2026 | |
| #50 Grok 4.20 0309 (Non-reasoning) by xAI | 29.7 | 15 | 164 tok/s | N/A | $2.00 / $6.00 | Mar 10, 2026 | |
| #51 GPT-5 nano (medium) by OpenAI | 25.9 | 63 | 164 tok/s | 400K | $0.05 / $0.40 | Aug 7, 2025 | |
| #52 Grok 4.20 0309 v2 (Non-reasoning) by xAI | 29.0 | 15 | 162 tok/s | N/A | $2.00 / $6.00 | Apr 7, 2026 | |
| #53 GPT-5.4 mini (medium) by OpenAI | 37.7 | 26 | 161 tok/s | 400K | $0.75 / $4.50 | Mar 17, 2026 | |
| #54 Qwen3.6 35B A3B (Reasoning) by Alibaba | 43.5 | 53 | 160 tok/s | 262K | $0.25 / $1.49 | Apr 16, 2026 | |
| #55 Qwen3.6 35B A3B (Non-reasoning) by Alibaba | 31.5 | 31 | 159 tok/s | 262K | $0.38 / $2.25 | Apr 16, 2026 | |
| #56 Small 4 (Non-reasoning) by Mistral | 18.6 | 33 | 158 tok/s | N/A | $0.15 / $0.60 | Mar 16, 2026 | |
| #57 Small 3.1 by Mistral | 14.5 | 35 | 158 tok/s | N/A | $0.10 / $0.23 | Mar 17, 2025 | |
| #58 GPT-5.4 nano (Non-Reasoning) by OpenAI | 24.4 | 32 | 158 tok/s | 400K | $0.20 / $1.25 | Mar 17, 2026 | |
| #59 Small (Feb '24) by Mistral | 9.0 | 7 | 157 tok/s | N/A | $1.00 / $3.00 | Feb 26, 2024 | |
| #60 GPT-5.4 nano (medium) by OpenAI | 38.1 | 51 | 157 tok/s | 400K | $0.20 / $1.25 | Mar 17, 2026 | |
| #61 GPT-5 nano (minimal) by OpenAI | 13.8 | 34 | 155 tok/s | 400K | $0.05 / $0.40 | Aug 7, 2025 | |
| #62 GPT-5 nano (high) by OpenAI | 26.8 | 65 | 155 tok/s | 400K | $0.05 / $0.40 | Aug 7, 2025 | |
| #63 Small 3 by Mistral | 12.7 | 36 | 154 tok/s | N/A | $0.07 / $0.19 | Jan 30, 2025 | |
| #64 Small (Sep '24) by Mistral | 10.2 | 17 | 153 tok/s | N/A | $0.20 / $0.60 | Sep 17, 2024 | |
| #65 Qwen3.5 122B A10B (Non-reasoning) by Alibaba | 35.9 | 31 | 151 tok/s | 262K | $0.40 / $3.20 | Feb 24, 2026 | |
| #66 o4-mini (high) by OpenAI | 33.1 | 22 | 151 tok/s | 200K | $1.10 / $4.40 | Apr 16, 2025 | |
| #67 GPT-5.4 mini (xhigh) by OpenAI | 48.9 | 34 | 151 tok/s | 400K | $0.75 / $4.50 | Mar 17, 2026 | |
| #68 GPT-5.4 mini (Non-Reasoning) by OpenAI | 23.3 | 16 | 150 tok/s | 400K | $0.75 / $4.50 | Mar 17, 2026 | |
| #69 Nova 2.0 Lite (high) by Amazon | 34.5 | 34 | 150 tok/s | N/A | $0.30 / $2.50 | Oct 29, 2025 | |
| #70 Nemotron 3 Super 120B A12B (Reasoning) by NVIDIA | 36.0 | 51 | 150 tok/s | 1.0M | $0.30 / $0.75 | Mar 11, 2026 | |
| #71 Qwen3 VL 8B Instruct by Alibaba | 14.3 | 23 | 148 tok/s | 256K | $0.18 / $0.70 | Oct 14, 2025 | |
| #72 Qwen3.5 35B A3B (Non-reasoning) by Alibaba | 30.7 | 33 | 144 tok/s | 262K | $0.25 / $2.00 | Feb 24, 2026 | |
| #73 Qwen3.5 122B A10B (Reasoning) by Alibaba | 41.6 | 36 | 144 tok/s | 262K | $0.40 / $3.20 | Feb 24, 2026 | |
| #74 M (Reasoning) by Sarvam | 8.4 | N/A | 142 tok/s | N/A | N/A / N/A | May 23, 2025 | |
| #75 Nova 2.0 Lite (medium) by Amazon | 29.7 | 29 | 141 tok/s | N/A | $0.30 / $2.50 | Oct 29, 2025 | |
| #76 Qwen3 30B A3B 2507 (Reasoning) by Alibaba | 22.4 | 25 | 139 tok/s | N/A | $0.28 / $1.85 | Jul 30, 2025 | |
| #77 GPT-4o (Nov '24) by OpenAI | 17.3 | 7 | 138 tok/s | 128K | $2.50 / $10.00 | Nov 20, 2024 | |
| #78 Qwen3 VL 8B (Reasoning) by Alibaba | 16.7 | 19 | 138 tok/s | N/A | $0.18 / $2.10 | Oct 14, 2025 | |
| #79 Qwen3 Next 80B A3B (Reasoning) by Alibaba | 26.7 | 18 | 136 tok/s | N/A | $0.50 / $6.00 | Sep 11, 2025 | |
| #80 Grok 4.3 (low) by xAI | 43.9 | 32 | 136 tok/s | N/A | $1.25 / $2.50 | Apr 30, 2026 | |
| #81 30B (high) by Sarvam | 12.3 | 51 | 135 tok/s | N/A | $0.03 / $0.11 | Mar 6, 2026 | |
| #82 Medium 3.5 by Mistral | 39.2 | 20 | 134 tok/s | N/A | $1.50 / $7.50 | Apr 29, 2026 | |
| #83 Granite 4.1 8B by IBM | 12.4 | 45 | 134 tok/s | N/A | $0.05 / $0.10 | Apr 29, 2026 | |
| #84 Nova 2.0 Pro Preview (low) by Amazon | 31.9 | 16 | 134 tok/s | N/A | $1.25 / $10.00 | Nov 27, 2025 | |
| #85 Nova 2.0 Pro Preview (Non-reasoning) by Amazon | 23.1 | 11 | 134 tok/s | N/A | $1.25 / $10.00 | Nov 27, 2025 | |
| #86 Nemotron Nano 9B V2 (Non-reasoning) by NVIDIA | 13.2 | 41 | 134 tok/s | 128K | $0.05 / $0.20 | Aug 18, 2025 | |
| #87 Nemotron 3 Nano 30B A3B (Reasoning) by NVIDIA | 24.3 | 71 | 134 tok/s | 256K | $0.06 / $0.22 | Dec 15, 2025 | |
| #88 GPT-3.5 Turbo by OpenAI | 9.0 | 9 | 133 tok/s | 16K | $0.50 / $1.50 | Nov 30, 2022 | |
| #89 Qwen3 Next 80B A3B Instruct by Alibaba | 20.1 | 19 | 132 tok/s | 262K | $0.50 / $2.00 | Sep 11, 2025 | |
| #90 Kimi K2 Thinking by MoonshotAI | 40.9 | 36 | 131 tok/s | 262K | $0.60 / $2.50 | Nov 6, 2025 | |
| #91 MiMo-V2-Flash (Reasoning) by Xiaomi | 39.2 | 91 | 130 tok/s | 262K | $0.10 / $0.30 | Dec 16, 2025 | |
| #92 GPT-4.1 by OpenAI | 26.3 | 13 | 128 tok/s | 1.0M | $2.00 / $8.00 | Apr 14, 2025 | |
| #93 Qwen3.5 35B A3B (Reasoning) by Alibaba | 37.1 | 40 | 127 tok/s | 262K | $0.25 / $2.00 | Feb 24, 2026 | |
| #94 Qwen3 VL 30B A3B (Reasoning) by Alibaba | 19.7 | 31 | 127 tok/s | N/A | $0.20 / $0.75 | Oct 3, 2025 | |
| #95 Small 3.2 by Mistral | 15.1 | 38 | 127 tok/s | N/A | $0.09 / $0.25 | Jun 20, 2025 | |
| #96 Nova 2.0 Pro Preview (medium) by Amazon | 35.7 | 17 | 126 tok/s | N/A | $1.25 / $10.00 | Nov 27, 2025 | |
| #97 Grok 4.3 (high) by xAI | 53.2 | 38 | 125 tok/s | N/A | $1.25 / $2.50 | Apr 30, 2026 | |
| #98 MiMo-V2-Flash (Feb 2026) by Xiaomi | 41.5 | 97 | 125 tok/s | 262K | $0.10 / $0.30 | Dec 16, 2025 | |
| #99 Grok 4.3 (medium) by xAI | 48.8 | 35 | 125 tok/s | N/A | $1.25 / $2.50 | Apr 30, 2026 | |
| #100 Qwen3 VL 30B A3B Instruct by Alibaba | 16.0 | 26 | 124 tok/s | 262K | $0.20 / $0.60 | Oct 3, 2025 | |
Showing 100 of 529 models
Understanding the AI Model Leaderboard
This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.
Core AI Benchmarks Explained
Key Metrics to Consider
How to Choose the Right AI Model for Your Use Case
For Research & Analysis
Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation
For Cost Optimization
Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks
For Math & STEM
Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications
All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.
Frequently Asked Questions
What is MMLU-Pro and why is it the standard AI intelligence benchmark?
MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.
What does GPQA measure and which models score highest?
GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.
What is AIME 2025 and how does it evaluate AI mathematical ability?
AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.
How is AI model pricing calculated and what's considered cost-effective?
AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.
Which AI models are best for coding and programming tasks?
Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.
How often are AI model benchmarks and rankings updated?
Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.
What inference speed (tokens/second) do I need for my application?
Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.
Can I test these AI models for free before committing?
Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.