AI Model Ranking (LLM Leaderboard)
Most Intelligent AI Models
Language models ranked by GPQA reasoning score
| Model AI model name and provider organization | Price/1M Cost per 1 million tokens β input (text you send) / output (text the model generates) |
MMLU-Pro
Massive Multitask Language Understanding (Professional) - tests broad knowledge across 14 subjects including STEM, humanities, and social sciences |
GPQA
Graduate-level Google-Proof Q&A benchmark - tests PhD-level reasoning and advanced intelligence |
AIME 2025
American Invitational Mathematics Examination 2025 - tests advanced mathematical problem-solving ability | Release When the model was released - newer models may have more capabilities | Compare |
|---|---|---|---|---|---|---|
| #1 Anthropic: Claude Sonnet 4.6 by anthropic | $3.00 / $15.00 | - | - | - | Feb 17, 2026 | |
| #2 DeepSeek: DeepSeek V3.2 by deepseek | $0.25 / $0.38 | - | - | - | Dec 1, 2025 | |
| #3 Google: Gemini 3 Flash Preview by google | $0.50 / $3.00 | - | - | - | Dec 17, 2025 | |
| #4 Anthropic: Claude Opus 4.7 by anthropic | $5.00 / $25.00 | - | - | - | Apr 16, 2026 | |
| #5 MoonshotAI: Kimi K2.6 by moonshotai | $0.74 / $4.66 | - | - | - | Apr 20, 2026 | |
| #6 Xiaomi: MiMo-V2-Pro by xiaomi | $1.00 / $3.00 | - | - | - | Mar 18, 2026 | |
| #7 MiniMax: MiniMax M2.5 by minimax | $0.15 / $1.15 | - | - | - | Feb 12, 2026 | |
| #8 MiniMax: MiniMax M2.7 by minimax | $0.30 / $1.20 | - | - | - | Mar 18, 2026 | |
| #9 Anthropic: Claude Opus 4.6 by anthropic | $5.00 / $25.00 | - | - | - | Feb 4, 2026 | |
| #10 xAI: Grok 4.1 Fast by x-ai | $0.20 / $0.50 | - | - | - | Nov 19, 2025 | |
| #11 Google: Gemini 2.5 Flash Lite by google | $0.10 / $0.40 | - | - | - | Jul 22, 2025 | |
| #12 StepFun: Step 3.5 Flash by stepfun | $0.10 / $0.30 | - | - | - | Jan 29, 2026 | |
| #13 OpenAI: GPT-5.4 by openai | $2.50 / $15.00 | - | - | - | Mar 5, 2026 | |
| #14 Google: Gemini 2.5 Flash by google | $0.30 / $2.50 | - | - | - | Jun 17, 2025 | |
| #15 Z.ai: GLM 5.1 by z-ai | $1.05 / $3.50 | - | - | - | Apr 7, 2026 | |
| #16 OpenAI: gpt-oss-120b by openai | $0.04 / $0.19 | - | - | - | Aug 5, 2025 | |
| #17 Google: Gemini 3.1 Pro Preview by google | $2.00 / $12.00 | - | - | - | Feb 19, 2026 | |
| #18 MoonshotAI: Kimi K2.5 by moonshotai | $0.44 / $2.00 | - | - | - | Jan 27, 2026 | |
| #19 Z.ai: GLM 5 by z-ai | $0.60 / $2.08 | - | - | - | Feb 11, 2026 | |
| #20 Qwen: Qwen3.6 Plus by qwen | $0.33 / $1.95 | - | - | - | Apr 2, 2026 | |
| #21 Google: Gemini 3.1 Flash Lite Preview by google | $0.25 / $1.50 | - | - | - | Mar 3, 2026 | |
| #22 Z.ai: GLM 5 Turbo by z-ai | $1.20 / $4.00 | - | - | - | Mar 15, 2026 | |
| #23 Anthropic: Claude Sonnet 4.5 by anthropic | $3.00 / $15.00 | - | - | - | Sep 29, 2025 | |
| #24 Anthropic: Claude Haiku 4.5 by anthropic | $1.00 / $5.00 | - | - | - | Oct 15, 2025 | |
| #25 OpenAI: GPT-4o-mini by openai | $0.15 / $0.60 | - | - | - | Jul 18, 2024 | |
| #26 OpenAI: GPT-5.4 Nano by openai | $0.20 / $1.25 | - | - | - | Mar 17, 2026 | |
| #27 OpenAI: GPT-5 Mini by openai | $0.25 / $2.00 | - | - | - | Aug 7, 2025 | |
| #28 OpenAI: GPT-5.3-Codex by openai | $1.75 / $14.00 | - | - | - | Feb 24, 2026 | |
| #29 Xiaomi: MiMo-V2-Flash by xiaomi | $0.09 / $0.29 | - | - | - | Dec 14, 2025 | |
| #30 OpenAI: GPT-5.4 Mini by openai | $0.75 / $4.50 | - | - | - | Mar 17, 2026 | |
| #31 Mistral: Mistral Nemo by mistralai | $0.01 / $0.03 | - | - | - | Jul 19, 2024 | |
| #32 Google: Gemini 2.0 Flash by google | $0.10 / $0.40 | - | - | - | Feb 5, 2025 | |
| #33 xAI: Grok 4 Fast by x-ai | $0.20 / $0.50 | - | - | - | Sep 19, 2025 | |
| #34 Qwen: Qwen3.5-Flash by qwen | $0.07 / $0.26 | - | - | - | Feb 25, 2026 | |
| #35 Google: Gemma 4 31B by google | $0.13 / $0.38 | - | - | - | Apr 2, 2026 | |
| #36 Google: Gemma 4 26B A4B by google | $0.06 / $0.33 | - | - | - | Apr 3, 2026 | |
| #37 Qwen: Qwen3 235B A22B Instruct 2507 by qwen | $0.07 / $0.10 | - | - | - | Jul 21, 2025 | |
| #38 OpenAI: GPT-4.1 Mini by openai | $0.40 / $1.60 | - | - | - | Apr 14, 2025 | |
| #39 Z.ai: GLM 4.5 Air by z-ai | $0.13 / $0.85 | - | - | - | Jul 25, 2025 | |
| #40 Qwen: Qwen3.5 397B A17B by qwen | $0.39 / $2.34 | - | - | - | Feb 16, 2026 | |
| #41 Google: Gemini 2.5 Pro by google | $1.25 / $10.00 | - | - | - | Jun 17, 2025 | |
| #42 Qwen: Qwen3.5-9B by qwen | $0.10 / $0.15 | - | - | - | Mar 10, 2026 | |
| #43 Google: Gemini 2.5 Flash Lite Preview 09-2025 by google | $0.10 / $0.40 | - | - | - | Sep 25, 2025 | |
| #44 Anthropic: Claude Sonnet 4 by anthropic | $3.00 / $15.00 | - | - | - | May 22, 2025 | |
| #45 DeepSeek: DeepSeek V3 0324 by deepseek | $0.20 / $0.77 | - | - | - | Mar 24, 2025 | |
| #46 DeepSeek: DeepSeek V3.1 by deepseek | $0.15 / $0.75 | - | - | - | Aug 21, 2025 | |
| #47 OpenAI: GPT-5 Chat by openai | $1.25 / $10.00 | - | - | - | Aug 7, 2025 | |
| #48 Z.ai: GLM 4.7 by z-ai | $0.38 / $1.74 | - | - | - | Dec 22, 2025 | |
| #49 Meta: Llama 3.1 8B Instruct by meta-llama | $0.02 / $0.05 | - | - | - | Jul 23, 2024 | |
| #50 OpenAI: gpt-oss-20b by openai | $0.03 / $0.14 | - | - | - | Aug 5, 2025 | |
| #51 Anthropic: Claude Opus 4.5 by anthropic | $5.00 / $25.00 | - | - | - | Nov 24, 2025 | |
| #52 Qwen: Qwen3.5-35B-A3B by qwen | $0.16 / $1.30 | - | - | - | Feb 25, 2026 | |
| #53 OpenAI: GPT-4.1 by openai | $2.00 / $8.00 | - | - | - | Apr 14, 2025 | |
| #54 OpenAI: GPT-5 Nano by openai | $0.05 / $0.40 | - | - | - | Aug 7, 2025 | |
| #55 Meta: Llama 3.1 70B Instruct by meta-llama | $0.40 / $0.40 | - | - | - | Jul 23, 2024 | |
| #56 Xiaomi: MiMo-V2-Omni by xiaomi | $0.40 / $2.00 | - | - | - | Mar 18, 2026 | |
| #57 Google: Gemini 2.0 Flash Lite by google | $0.07 / $0.30 | - | - | - | Feb 25, 2025 | |
| #58 OpenAI: GPT-5.2 by openai | $1.75 / $14.00 | - | - | - | Dec 10, 2025 | |
| #59 Qwen: Qwen3.5 Plus 2026-02-15 by qwen | $0.26 / $1.56 | - | - | - | Feb 16, 2026 | |
| #60 Z.ai: GLM 4.7 Flash by z-ai | $0.06 / $0.40 | - | - | - | Jan 19, 2026 | |
| #61 OpenAI: GPT-5.1 by openai | $1.25 / $10.00 | - | - | - | Nov 13, 2025 | |
| #62 Qwen: Qwen3 VL 235B A22B Instruct by qwen | $0.20 / $0.88 | - | - | - | Sep 23, 2025 | |
| #63 Mistral: Mistral Small 3.2 24B by mistralai | $0.07 / $0.20 | - | - | - | Jun 20, 2025 | |
| #64 OpenAI: GPT-4.1 Nano by openai | $0.10 / $0.40 | - | - | - | Apr 14, 2025 | |
| #65 Xiaomi: MiMo-V2.5-Pro by xiaomi | $1.00 / $3.00 | - | - | - | Apr 22, 2026 | |
| #66 Z.ai: GLM 4.6 by z-ai | $0.39 / $1.90 | - | - | - | Sep 30, 2025 | |
| #67 xAI: Grok 4.20 by x-ai | $2.00 / $6.00 | - | - | - | Mar 31, 2026 | |
| #68 Qwen: Qwen3 Coder Next by qwen | $0.15 / $0.80 | - | - | - | Feb 4, 2026 | |
| #69 Qwen: Qwen3.5-27B by qwen | $0.20 / $1.56 | - | - | - | Feb 25, 2026 | |
| #70 Meta: Llama 3.3 70B Instruct by meta-llama | $0.10 / $0.32 | - | - | - | Dec 6, 2024 | |
| #71 Meta: Llama 4 Maverick by meta-llama | $0.15 / $0.60 | - | - | - | Apr 5, 2025 | |
| #72 Qwen: Qwen3 30B A3B Instruct 2507 by qwen | $0.09 / $0.30 | - | - | - | Jul 29, 2025 | |
| #73 NVIDIA: Nemotron 3 Nano 30B A3B by nvidia | $0.05 / $0.20 | - | - | - | Dec 14, 2025 | |
| #74 Qwen: Qwen3 Coder 480B A35B by qwen | $0.22 / $1.00 | - | - | - | Jul 23, 2025 | |
| #75 OpenAI: GPT-5.1 Chat by openai | $1.25 / $10.00 | - | - | - | Nov 13, 2025 | |
| #76 NVIDIA: Nemotron 3 Super by nvidia | $0.09 / $0.45 | - | - | - | Mar 11, 2026 | |
| #77 Qwen: Qwen3 Next 80B A3B Instruct by qwen | $0.09 / $1.10 | - | - | - | Sep 11, 2025 | |
| #78 Google: Gemma 3 27B by google | $0.08 / $0.16 | - | - | - | Mar 12, 2025 | |
| #79 DeepSeek: DeepSeek V3.1 Terminus by deepseek | $0.21 / $0.79 | - | - | - | Sep 22, 2025 | |
| #80 DeepSeek: DeepSeek V3.2 Exp by deepseek | $0.27 / $0.41 | - | - | - | Sep 29, 2025 | |
| #81 OpenAI: GPT-4o-mini (2024-07-18) by openai | $0.15 / $0.60 | - | - | - | Jul 18, 2024 | |
| #82 DeepSeek: DeepSeek V3 by deepseek-ai | $0.32 / $0.89 | - | - | - | Dec 26, 2024 | |
| #83 Arcee AI: Trinity Large Thinking by arcee-ai | $0.22 / $0.85 | - | - | - | Apr 1, 2026 | |
| #84 Anthropic: Claude 3.5 Haiku by anthropic | $0.80 / $4.00 | - | - | - | Nov 4, 2024 | |
| #85 Meta: Llama 4 Scout by meta-llama | $0.08 / $0.30 | - | - | - | Apr 5, 2025 | |
| #86 MoonshotAI: Kimi K2 0905 by moonshotai | $0.40 / $2.00 | - | - | - | Sep 4, 2025 | |
| #87 OpenAI: GPT-5 by openai | $1.25 / $10.00 | - | - | - | Aug 7, 2025 | |
| #88 Qwen: Qwen-Turbo by qwen | $0.03 / $0.13 | - | - | - | Feb 1, 2025 | |
| #89 Qwen: Qwen3 32B by qwen | $0.08 / $0.24 | - | - | - | Apr 28, 2025 | |
| #90 Anthropic: Claude 3.7 Sonnet by anthropic | $3.00 / $15.00 | - | - | - | Feb 24, 2025 | |
| #91 Qwen: Qwen3.5-122B-A10B by qwen | $0.26 / $2.08 | - | - | - | Feb 25, 2026 | |
| #92 OpenAI: GPT-4o by openai | $2.50 / $10.00 | - | - | - | May 13, 2024 | |
| #93 Google: Gemma 3 12B by google | $0.04 / $0.13 | - | - | - | Mar 13, 2025 | |
| #94 Mistral: Mistral Small 4 by mistralai | $0.15 / $0.60 | - | - | - | Mar 16, 2026 | |
| #95 DeepSeek: R1 0528 by deepseek | $0.50 / $2.15 | - | - | - | May 28, 2025 | |
| #96 Qwen: Qwen3 8B by qwen | $0.05 / $0.40 | - | - | - | Apr 28, 2025 | |
| #97 xAI: Grok Code Fast 1 by x-ai | $0.20 / $1.50 | - | - | - | Aug 26, 2025 | |
| #98 Qwen: Qwen3 VL 8B Instruct by qwen | $0.08 / $0.50 | - | - | - | Oct 14, 2025 | |
| #99 Qwen2.5 72B Instruct by qwen | $0.12 / $0.39 | - | - | - | Sep 19, 2024 | |
| #100 Google: Gemini 3.1 Pro Preview Custom Tools by google | $2.00 / $12.00 | - | - | - | Feb 25, 2026 |
Showing 100 of 331 models
Chat with 100+ AI Models in one App.
Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.
Understanding the AI Model Leaderboard
This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.
Core AI Benchmarks Explained
Key Metrics to Consider
How to Choose the Right AI Model for Your Use Case
For Research & Analysis
Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation
For Cost Optimization
Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks
For Math & STEM
Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications
All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.
Frequently Asked Questions
What is MMLU-Pro and why is it the standard AI intelligence benchmark?
MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.
What does GPQA measure and which models score highest?
GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.
What is AIME 2025 and how does it evaluate AI mathematical ability?
AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.
How is AI model pricing calculated and what's considered cost-effective?
AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.
Which AI models are best for coding and programming tasks?
Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.
How often are AI model benchmarks and rankings updated?
Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.
What inference speed (tokens/second) do I need for my application?
Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.
Can I test these AI models for free before committing?
Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.