AI Model Ranking (LLM Leaderboard)

Most Expensive AI Models

Language models ranked by Artificial Analysis Index

Best Picks

Start with these models

Quick recommendations from the current benchmark, speed, and pricing data.

Best Overall

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

by Anthropic

64.9 Highest intelligence score

Best Value

HyperNova 60B 2605

by Multiverse Computing

100 Value score · $0.07 blended

Best for Coding

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

by Anthropic

62.0 Highest coding index

Fastest Usable

Step 3.7 Flash

by StepFun

396 tok/s Fastest model with solid intelligence

Model AI model name and provider organization	Intelligence Artificial Analysis Intelligence Index - composite reasoning and capability score across the benchmark suite	Value Quality, speed, and blended token price combined into a relative value score	Speed Inference throughput in tokens per second - how fast the model generates responses	Context Maximum context window size - how much text, code, or conversation the model can process at once	Price Cost per 1 million tokens — input (text you send) / output (text the model generates)	Release When the model was released - newer models may have more capabilities	Compare
#1 o1-pro by OpenAI	25.8	1	N/A	200K	$150.00 / $600.00	Mar 19, 2025	Details
#2 GPT-4 by OpenAI	12.8	1	37 tok/s	8K	$30.00 / $60.00	Mar 14, 2023	Details
#3 GPT-5.4 Pro (xhigh) by OpenAI	—	N/A	N/A	1.1M	$30.00 / $180.00	Mar 5, 2026	Details
#4 o3-pro by OpenAI	40.7	4	21 tok/s	200K	$20.00 / $80.00	Jun 10, 2025	Details
#5 o1-preview by OpenAI	23.7	2	N/A	N/A	$16.50 / $66.00	Sep 12, 2024	Details
AI Chat Chat with 80+ models Chat for free Inference API EU-hosted inference Get API access
#6 o1 by OpenAI	30.7	5	131 tok/s	200K	$15.00 / $60.00	Dec 5, 2024	Details
#7 Claude 3 Opus by Anthropic	18.0	2	N/A	N/A	$15.00 / $75.00	Mar 4, 2024	Details
#8 Claude 4.1 Opus (Non-reasoning) by Anthropic	36.0	3	40 tok/s	N/A	$15.00 / $75.00	Aug 5, 2025	Details
#9 Claude 4.1 Opus (Reasoning) by Anthropic	42.0	4	40 tok/s	N/A	$15.00 / $75.00	Aug 5, 2025	Details
#10 Claude 4 Opus (Reasoning) by Anthropic	39.0	4	40 tok/s	N/A	$15.00 / $75.00	May 22, 2025	Details
#11 Claude 4 Opus (Non-reasoning) by Anthropic	33.0	3	39 tok/s	N/A	$15.00 / $75.00	May 22, 2025	Details
#12 Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) by Anthropic	64.9	8	71 tok/s	1.0M	$10.00 / $50.00	Jun 9, 2026	Details
#13 GPT-4 Turbo by OpenAI	13.7	2	29 tok/s	128K	$10.00 / $30.00	Nov 6, 2023	Details
#14 Grok 4 by xAI	41.5	7	N/A	N/A	$5.50 / $27.50	Jul 10, 2025	Details
#15 GPT-5.5 (medium) by OpenAI	56.7	9	56 tok/s	1.1M	$5.00 / $30.00	Apr 23, 2026	Details
#16 GPT-5.5 (Non-reasoning) by OpenAI	40.9	6	54 tok/s	1.1M	$5.00 / $30.00	Apr 23, 2026	Details
#17 GPT-5.5 (high) by OpenAI	58.9	9	56 tok/s	1.1M	$5.00 / $30.00	Apr 23, 2026	Details
#18 GPT-5.5 (low) by OpenAI	50.8	8	59 tok/s	1.1M	$5.00 / $30.00	Apr 23, 2026	Details
#19 GPT-5.5 Instant (May 2026) by OpenAI	41.8	7	N/A	N/A	$5.00 / $30.00	May 5, 2026	Details
#20 GPT-5.5 (xhigh) by OpenAI	60.2	9	62 tok/s	1.1M	$5.00 / $30.00	Apr 23, 2026	Details
#21 Claude Opus 4.7 (Non-reasoning, High Effort) by Anthropic	51.8	9	44 tok/s	N/A	$5.00 / $25.00	Apr 16, 2026	Details
#22 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) by Anthropic	57.3	9	52 tok/s	1.0M	$5.00 / $25.00	Apr 16, 2026	Details
#23 Claude Opus 4.8 (Adaptive Reasoning, Max Effort) by Anthropic	61.4	10	58 tok/s	1.0M	$5.00 / $25.00	May 28, 2026	Details
#24 GPT-4o (May '24) by OpenAI	14.5	4	107 tok/s	128K	$5.00 / $15.00	May 13, 2024	Details
#25 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) by Anthropic	52.9	9	49 tok/s	N/A	$5.00 / $25.00	Feb 5, 2026	Details
#26 Claude Opus 4.5 (Non-reasoning) by Anthropic	43.1	7	59 tok/s	200K	$5.00 / $25.00	Nov 24, 2025	Details
#27 Claude Opus 4.5 (Reasoning) by Anthropic	49.7	8	54 tok/s	200K	$5.00 / $25.00	Nov 24, 2025	Details
#28 Claude Opus 4.6 (Non-reasoning, High Effort) by Anthropic	46.5	8	47 tok/s	1.0M	$5.00 / $25.00	Feb 5, 2026	Details
#29 Large (Feb '24) by Mistral	9.9	2	N/A	N/A	$4.00 / $12.00	Feb 26, 2024	Details
#30 Grok 3 by xAI	25.2	5	N/A	N/A	$4.00 / $20.00	Feb 19, 2025	Details
#31 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) by Anthropic	51.7	11	65 tok/s	1.0M	$3.00 / $15.00	Feb 17, 2026	Details
#32 Claude Sonnet 4.6 (Non-reasoning, High Effort) by Anthropic	44.4	9	48 tok/s	1.0M	$3.00 / $15.00	Feb 17, 2026	Details
#33 Claude Sonnet 4.6 (Non-reasoning, Low Effort) by Anthropic	42.6	9	48 tok/s	1.0M	$3.00 / $15.00	Feb 17, 2026	Details
#34 Claude 3.5 Sonnet (Oct '24) by Anthropic	15.9	3	N/A	N/A	$3.00 / $15.00	Oct 22, 2024	Details
#35 Claude 3.5 Sonnet (June '24) by Anthropic	14.2	3	N/A	N/A	$3.00 / $15.00	Jun 21, 2024	Details
#36 Claude 3 Sonnet by Anthropic	10.3	2	N/A	N/A	$3.00 / $15.00	Mar 4, 2024	Details
#37 Claude 3.7 Sonnet (Non-reasoning) by Anthropic	30.8	7	N/A	N/A	$3.00 / $15.00	Feb 24, 2025	Details
#38 Claude 4 Sonnet (Non-reasoning) by Anthropic	33.0	7	50 tok/s	N/A	$3.00 / $15.00	May 22, 2025	Details
#39 Claude 4 Sonnet (Reasoning) by Anthropic	38.7	8	47 tok/s	N/A	$3.00 / $15.00	May 22, 2025	Details
#40 Claude 4.5 Sonnet (Non-reasoning) by Anthropic	37.1	8	50 tok/s	N/A	$3.00 / $15.00	Sep 29, 2025	Details
#41 Claude 4.5 Sonnet (Reasoning) by Anthropic	43.0	9	54 tok/s	N/A	$3.00 / $15.00	Sep 29, 2025	Details
#42 Command-R+ (Apr '24) by Cohere	8.3	2	N/A	N/A	$3.00 / $15.00	Apr 4, 2024	Details
#43 Llama 3.1 Instruct 405B by Meta	17.4	5	59 tok/s	N/A	$2.75 / $6.50	Jul 23, 2024	Details
#44 Medium by Mistral	9.0	2	58 tok/s	N/A	$2.75 / $8.10	Dec 11, 2023	Details
#45 GPT-5.4 (Non-reasoning) by OpenAI	35.4	9	86 tok/s	1.1M	$2.63 / $15.75	Mar 5, 2026	Details
#46 GPT-5.4 (low) by OpenAI	47.9	13	93 tok/s	1.1M	$2.63 / $15.75	Mar 5, 2026	Details
#47 Nova Premier by Amazon	19.0	4	75 tok/s	N/A	$2.50 / $12.50	Apr 30, 2025	Details
#48 Command A by Cohere	13.5	4	80 tok/s	256K	$2.50 / $10.00	Mar 13, 2025	Details
#49 Qwen3.7 Max by Alibaba	56.6	25	187 tok/s	1.0M	$2.50 / $7.50	May 19, 2026	Details
#50 GPT-4o (Aug '24) by OpenAI	18.6	6	94 tok/s	128K	$2.50 / $10.00	Aug 6, 2024	Details
#51 GPT-4o (Nov '24) by OpenAI	17.3	7	126 tok/s	128K	$2.50 / $10.00	Nov 20, 2024	Details
#52 GPT-5.4 (xhigh) by OpenAI	56.8	20	121 tok/s	1.1M	$2.50 / $15.00	Mar 5, 2026	Details
#53 o3 by OpenAI	38.4	17	122 tok/s	200K	$2.00 / $8.00	Apr 16, 2025	Details
#54 Gemini 3.1 Pro Preview by Google	57.2	23	132 tok/s	1.0M	$2.00 / $12.00	Feb 19, 2026	Details
#55 Magistral Medium 1.2 by Mistral	27.1	9	40 tok/s	N/A	$2.00 / $5.00	Sep 18, 2025	Details
#56 Jamba 1.7 Large by AI21 Labs	10.9	3	56 tok/s	N/A	$2.00 / $8.00	Jul 7, 2025	Details
#57 GPT-4.1 by OpenAI	26.3	12	128 tok/s	1.0M	$2.00 / $8.00	Apr 14, 2025	Details
#58 Gemini 3 Pro Preview (low) by Google	41.3	10	N/A	N/A	$2.00 / $12.00	Nov 18, 2025	Details
#59 Gemini 3 Pro Preview (high) by Google	48.4	12	N/A	N/A	$2.00 / $12.00	Nov 18, 2025	Details
#60 Large 2 (Nov '24) by Mistral	15.1	5	62 tok/s	N/A	$2.00 / $6.00	Nov 18, 2024	Details
#61 Large 2 (Jul '24) by Mistral	13.0	4	N/A	N/A	$2.00 / $6.00	Jul 24, 2024	Details
#62 Pixtral Large by Mistral	14.0	4	60 tok/s	N/A	$2.00 / $6.00	Nov 18, 2024	Details
#63 Grok 4.20 0309 v2 (Reasoning) by xAI	49.3	25	187 tok/s	2.0M	$2.00 / $6.00	Apr 7, 2026	Details
#64 Grok 4.20 0309 v2 (Non-reasoning) by xAI	29.0	15	167 tok/s	N/A	$2.00 / $6.00	Apr 7, 2026	Details
#65 Grok 4.20 0309 (Non-reasoning) by xAI	29.7	15	161 tok/s	N/A	$2.00 / $6.00	Mar 10, 2026	Details
#66 Grok 4.20 0309 (Reasoning) by xAI	48.5	24	173 tok/s	N/A	$2.00 / $6.00	Mar 10, 2026	Details
#67 Jamba 1.5 Large by AI21 Labs	10.7	3	N/A	N/A	$2.00 / $8.00	Aug 22, 2024	Details
#68 Jamba 1.6 Large by AI21 Labs	10.6	3	55 tok/s	N/A	$2.00 / $8.00	Mar 6, 2025	Details
#69 GPT-5.3 Codex (xhigh) by OpenAI	53.6	13	79 tok/s	400K	$1.75 / $14.00	Feb 5, 2026	Details
#70 GPT-5.2 (xhigh) by OpenAI	51.3	13	79 tok/s	400K	$1.75 / $14.00	Dec 11, 2025	Details
#71 GPT-5.2 Codex (xhigh) by OpenAI	49.0	19	128 tok/s	400K	$1.75 / $14.00	Dec 11, 2025	Details
#72 GPT-5.2 (medium) by OpenAI	46.6	11	N/A	400K	$1.75 / $14.00	Dec 11, 2025	Details
#73 GPT-5.2 (Non-reasoning) by OpenAI	33.6	8	64 tok/s	400K	$1.75 / $14.00	Dec 11, 2025	Details
#74 R1 (Jan '25) by DeepSeek	18.8	6	N/A	64K	$1.68 / $4.70	Jan 20, 2025	Details
#75 Qwen3 Max by Alibaba	31.4	9	57 tok/s	262K	$1.66 / $7.22	Sep 23, 2025	Details
#76 V3.1 Terminus (Reasoning) by DeepSeek	33.9	13	N/A	N/A	$1.64 / $2.75	Sep 22, 2025	Details
#77 Qwen2.5 Max by Alibaba	16.3	5	N/A	N/A	$1.60 / $6.40	Jan 28, 2025	Details
#78 Gemini 3.5 Flash (high) by Google	55.3	26	217 tok/s	1.0M	$1.50 / $9.00	May 19, 2026	Details
#79 Gemini 3.5 Flash (minimal) by Google	43.3	21	203 tok/s	1.0M	$1.50 / $9.00	May 19, 2026	Details
#80 Gemini 3.5 Flash (medium) by Google	54.8	26	207 tok/s	1.0M	$1.50 / $9.00	May 19, 2026	Details
#81 Medium 3.5 by Mistral	39.2	12	66 tok/s	N/A	$1.50 / $7.50	Apr 29, 2026	Details
#82 GLM-5.1 (Reasoning) by Z AI	51.4	18	71 tok/s	203K	$1.40 / $4.40	Apr 7, 2026	Details
#83 GLM-5.1 (Non-reasoning) by Z AI	43.8	16	66 tok/s	203K	$1.40 / $4.40	Apr 7, 2026	Details
#84 Llama 3.2 Instruct 90B (Vision) by Meta	11.9	5	48 tok/s	N/A	$1.38 / $1.38	Sep 25, 2024	Details
#85 R1 0528 (May '25) by DeepSeek	27.1	10	N/A	64K	$1.35 / $4.20	May 28, 2025	Details
#86 Qwen3.6 Max Preview by Alibaba	51.8	16	39 tok/s	262K	$1.30 / $7.80	Apr 20, 2026	Details
#87 Gemini 2.5 Pro by Google	34.6	16	132 tok/s	1.0M	$1.25 / $10.00	Jun 5, 2025	Details
#88 Grok 4.3 (Non-reasoning) by xAI	31.0	21	124 tok/s	1.0M	$1.25 / $2.50	Apr 30, 2026	Details
#89 Grok 4.3 (medium) by xAI	48.8	34	141 tok/s	1.0M	$1.25 / $2.50	Apr 30, 2026	Details
#90 Grok 4.3 (high) by xAI	53.2	37	155 tok/s	1.0M	$1.25 / $2.50	Apr 30, 2026	Details
#91 Grok 4.3 (low) by xAI	43.9	29	120 tok/s	1.0M	$1.25 / $2.50	Apr 30, 2026	Details
#92 Nova 2.0 Pro Preview (medium) by Amazon	35.7	17	128 tok/s	N/A	$1.25 / $10.00	Nov 27, 2025	Details
#93 Nova 2.0 Pro Preview (Non-reasoning) by Amazon	23.1	11	127 tok/s	N/A	$1.25 / $10.00	Nov 27, 2025	Details
#94 Nova 2.0 Pro Preview (low) by Amazon	31.9	15	142 tok/s	N/A	$1.25 / $10.00	Nov 27, 2025	Details
#95 Cogito v2.1 (Reasoning) by Deep Cogito	—	12	71 tok/s	N/A	$1.25 / $1.25	Nov 18, 2025	Details
#96 GPT-5 (minimal) by OpenAI	23.9	7	83 tok/s	400K	$1.25 / $10.00	Aug 7, 2025	Details
#97 GPT-5.1 (high) by OpenAI	47.7	20	113 tok/s	400K	$1.25 / $10.00	Nov 13, 2025	Details
#98 GPT-5 (high) by OpenAI	44.6	18	110 tok/s	400K	$1.25 / $10.00	Aug 7, 2025	Details
#99 GPT-5 (ChatGPT) by OpenAI	21.8	10	182 tok/s	400K	$1.25 / $10.00	Aug 7, 2025	Details
#100 GPT-5.1 (Non-reasoning) by OpenAI	27.4	10	98 tok/s	400K	$1.25 / $10.00	Nov 13, 2025	Details

Showing 100 of 359 models

Understanding the AI Model Leaderboard

This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.

Core AI Benchmarks Explained

MMLU-Pro Tests broad knowledge across 14 academic subjects

GPQA PhD-level reasoning & problem-solving

AIME 2025 Elite mathematical reasoning

Coding Index LiveCodeBench + SciCode composite

Math Index AIME + MATH-500 composite

Key Metrics to Consider

Token Pricing Input vs output cost per 1M tokens

Inference Speed Tokens/sec for response time

Release Date Latest techniques & knowledge

Benchmark Scores 0-100% capability comparison

How to Choose the Right AI Model for Your Use Case

For Research & Analysis

Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation

For Cost Optimization

Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks

For Math & STEM

Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications

All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.

Frequently Asked Questions

What is MMLU-Pro and why is it the standard AI intelligence benchmark?

MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.

What does GPQA measure and which models score highest?

GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.

What is AIME 2025 and how does it evaluate AI mathematical ability?

AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.

How is AI model pricing calculated and what's considered cost-effective?

AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.

Which AI models are best for coding and programming tasks?

Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.

How often are AI model benchmarks and rankings updated?

Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.

What inference speed (tokens/second) do I need for my application?

Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.

Can I test these AI models for free before committing?

Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.

Most Expensive AI Models

Start with these models

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

HyperNova 60B 2605

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

Step 3.7 Flash

Chat with 80+ models

EU-hosted inference

Understanding the AI Model Leaderboard

Core AI Benchmarks Explained

Key Metrics to Consider

How to Choose the Right AI Model for Your Use Case

Frequently Asked Questions

What is MMLU-Pro and why is it the standard AI intelligence benchmark?

What does GPQA measure and which models score highest?

What is AIME 2025 and how does it evaluate AI mathematical ability?

How is AI model pricing calculated and what's considered cost-effective?

Which AI models are best for coding and programming tasks?

How often are AI model benchmarks and rankings updated?

What inference speed (tokens/second) do I need for my application?

Can I test these AI models for free before committing?