AI Model Ranking (LLM Leaderboard)

Best AI Coding Models

Language models ranked by Artificial Analysis Index

Best Picks

Start with these models

Quick recommendations from the current benchmark, speed, and pricing data.

Best Overall

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

by Anthropic

Best Value

V4 Flash (Reasoning, Max Effort)

by DeepSeek

Best for Coding

GPT-5.6 Sol (xhigh)

by OpenAI

Fastest Usable

Gemini 3.6 Flash (high)

by Google

Model AI model name and provider organization	Intelligence Artificial Analysis Intelligence Index - composite reasoning and capability score across the benchmark suite	Value Score Quality, speed, and blended token price combined into a relative value score	Speed Inference throughput in tokens per second - how fast the model generates responses	Context Maximum context window size - how much text, code, or conversation the model can process at once	Price Cost per 1 million tokens — input (text you send) / output (text the model generates)	Compare
#1 GPT-5.6 Sol (xhigh) by OpenAI · Jul 9, 2026	57.7	12	65 tok/s	1.1M	$5.00/$30.00	Details
#2 GPT-5.6 Sol (max) by OpenAI · Jul 9, 2026	58.9	12	67 tok/s	1.1M	$5.00/$30.00	Details
#3 GPT-5.6 Sol (high) by OpenAI · Jul 9, 2026	55.9	11	57 tok/s	1.1M	$5.00/$30.00	Details
#4 GPT-5.6 Terra (max) by OpenAI · Jul 9, 2026	55.0	26	157 tok/s	1.1M	$2.50/$15.00	Details
#5 Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) by Anthropic · Jun 9, 2026	59.9	9	70 tok/s	1.0M	$10.00/$50.00	Details
Chat with leading AI models Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, Qwen & Kimi. Chat for free EU-hosted inference Servers in Germany & Finland. Designed to meet strict GDPR and ISO 27001 compliance requirements. Get API access
#6 GPT-5.6 Sol (medium) by OpenAI · Jul 9, 2026	53.6	11	60 tok/s	1.1M	$5.00/$30.00	Details
#7 K3 by Kimi · Jul 16, 2026	57.1	16	38 tok/s	1.0M	$3.00/$15.00	Details
#8 GPT-5.5 (xhigh) by OpenAI · Apr 23, 2026	54.8	13	90 tok/s	1.1M	$5.00/$30.00	Details
#9 Claude Opus 4.8 (Adaptive Reasoning, Max Effort) by Anthropic · May 28, 2026	55.7	12	63 tok/s	1.0M	$5.00/$25.00	Details
#10 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) by Anthropic · Apr 16, 2026	53.5	11	61 tok/s	1.0M	$5.00/$25.00	Details
#11 Grok 4.5 (high) by SpaceXAI · Jul 8, 2026	53.8	22	77 tok/s	500K	$2.00/$6.00	Details
#12 GPT-5.5 (high) by OpenAI · Apr 23, 2026	53.1	11	77 tok/s	1.1M	$5.00/$30.00	Details
#13 Claude Sonnet 5 (Adaptive Reasoning, Max Effort) by Anthropic · Jun 30, 2026	53.4	21	88 tok/s	1.0M	$2.00/$10.00	Details
#14 GPT-5.5 (medium) by OpenAI · Apr 23, 2026	50.4	10	76 tok/s	1.1M	$5.00/$30.00	Details
#15 GPT-5.6 Luna (max) by OpenAI · Jul 9, 2026	51.2	38	203 tok/s	1.1M	$1.00/$6.00	Details
#16 Muse Spark 1.1 (xhigh) by Meta · Jul 9, 2026	50.6	39	122 tok/s	1.0M	$1.25/$4.25	Details
#17 GPT-5.4 (xhigh) by OpenAI · Mar 5, 2026	51.4	24	152 tok/s	272K	$2.50/$15.00	Details
#18 GPT-5.6 Terra (xhigh) by OpenAI · Jul 9, 2026	51.6	25	131 tok/s	1.1M	$2.50/$15.00	Details
#19 Gemini 3.5 Flash (high) by Google · May 19, 2026	50.2	31	287 tok/s	131K	$1.50/$9.00	Details
#20 GPT-5.6 Sol (low) by OpenAI · Jul 9, 2026	49.4	10	56 tok/s	1.1M	$5.00/$30.00	Details
#21 Gemini 3.6 Flash (high) by Google · Jul 21, 2026	50.1	33	311 tok/s	1.0M	$1.50/$7.50	Details
#22 Gemini 3.1 Pro Preview by Google · Feb 19, 2026	46.5	25	136 tok/s	1.0M	$2.00/$12.00	Details
#23 GLM-5.2 (max) by Z AI · Jun 16, 2026	51.1	39	196 tok/s	1.0M	$1.40/$4.40	Details
#24 GPT-5.6 Luna (xhigh) by OpenAI · Jul 9, 2026	49.1	37	197 tok/s	1.1M	$1.00/$6.00	Details
#25 GPT-5.6 Terra (high) by OpenAI · Jul 9, 2026	49.0	23	131 tok/s	1.1M	$2.50/$15.00	Details
#26 Claude Sonnet 5 (Non-reasoning, High Effort) by Anthropic · Jun 30, 2026	41.7	14	61 tok/s	1.0M	$2.00/$10.00	Details
#27 Qwen3.7 Max by Alibaba · May 19, 2026	46.0	27	207 tok/s	1.0M	$2.50/$7.50	Details
#28 GPT-5.6 Sol (Non-reasoning) by OpenAI · Jul 9, 2026	41.2	8	55 tok/s	1.1M	$5.00/$30.00	Details
#29 GPT-5.6 Terra (medium) by OpenAI · Jul 9, 2026	45.6	20	118 tok/s	1.1M	$2.50/$15.00	Details
#30 GPT-5.6 Luna (high) by OpenAI · Jul 9, 2026	46.1	35	199 tok/s	1.1M	$1.00/$6.00	Details
#31 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) by Anthropic · Feb 17, 2026	47.2	13	72 tok/s	1.0M	$3.00/$15.00	Details
#32 Motif 3 (Beta) by Motif Technologies · N/A	44.1	N/A	N/A	N/A	N/A/N/A	Details
#33 K2.6 by Kimi · Apr 20, 2026	44.2	23	56 tok/s	262K	$0.95/$4.00	Details
#34 GPT-5.5 (low) by OpenAI · Apr 23, 2026	43.5	9	76 tok/s	1.1M	$5.00/$30.00	Details
#35 K2.7 Code by Kimi · Jun 12, 2026	41.9	22	47 tok/s	262K	$0.95/$4.00	Details
#36 MiMo-V2.5-Pro by Xiaomi · Apr 22, 2026	42.2	39	58 tok/s	1.0M	$0.43/$0.87	Details
#37 KAT Coder Pro V2 by KwaiKAT · Mar 27, 2026	33.7	31	N/A	N/A	$0.30/$1.20	Details
#38 V4 Pro (Reasoning, Max Effort) by DeepSeek · Apr 24, 2026	44.3	41	73 tok/s	1.0M	$0.43/$0.87	Details
#39 Nex-N2-Pro by Nex AGI · Jun 2, 2026	41.0	46	136 tok/s	262K	$0.50/$2.50	Details
#40 Hy3 by Tencent · Jul 6, 2026	41.2	N/A	60 tok/s	262K	N/A/N/A	Details
#41 V4 Pro (Reasoning, High Effort) by DeepSeek · Apr 24, 2026	43.1	40	68 tok/s	1.0M	$0.43/$0.87	Details
#42 Muse Spark by Meta · Apr 8, 2026	43.1	N/A	N/A	1.0M	N/A/N/A	Details
#43 M3 by MiniMax · Jun 1, 2026	44.4	54	97 tok/s	524K	$0.30/$1.20	Details
#44 GPT-5.6 Terra (low) by OpenAI · Jul 9, 2026	40.5	19	123 tok/s	1.1M	$2.50/$15.00	Details
#45 MiMo-V2.5 by Xiaomi · Apr 22, 2026	37.2	60	64 tok/s	1.0M	$0.14/$0.28	Details
#46 GPT-5.5 (Non-reasoning) by OpenAI · Apr 23, 2026	35.4	7	75 tok/s	1.1M	$5.00/$30.00	Details
#47 V4 Flash (Reasoning, Max Effort) by DeepSeek · Apr 24, 2026	40.3	100	115 tok/s	1.0M	$0.14/$0.28	Details
#48 GPT-5.4 mini (xhigh) by OpenAI · Mar 17, 2026	40.0	35	180 tok/s	400K	$0.75/$4.50	Details
#49 GPT-5.4 nano (xhigh) by OpenAI · Mar 17, 2026	38.2	63	163 tok/s	400K	$0.20/$1.25	Details
#50 Qwen3.7 Plus by Alibaba · Jun 1, 2026	39.0	32	53 tok/s	1.0M	$0.40/$1.60	Details
#51 GLM-5.1 (Reasoning) by Z AI · Apr 7, 2026	40.2	21	84 tok/s	1.0M	$1.40/$4.40	Details
#52 Qwen3.6 Plus by Alibaba · Apr 2, 2026	39.6	25	53 tok/s	1.0M	$0.50/$3.00	Details
#53 Qwen3.6 27B (Reasoning) by Alibaba · Apr 22, 2026	37.1	22	60 tok/s	262K	$0.60/$3.60	Details
#54 M2.7 by MiniMax · Mar 18, 2026	38.1	36	59 tok/s	197K	$0.30/$1.20	Details
#55 JT-4.1 Flash 236B A21B by China Mobile · Jul 9, 2026	38.8	N/A	N/A	N/A	N/A/N/A	Details
#56 GPT-5.6 Terra (Non-reasoning) by OpenAI · Jul 9, 2026	34.0	15	118 tok/s	1.1M	$2.50/$15.00	Details
#57 Inkling (xhigh) by Thinking Machines · Jul 15, 2026	40.7	17	N/A	524K	$1.87/$4.68	Details
#58 Claude 4.5 Sonnet (Reasoning) by Anthropic · Sep 29, 2025	36.4	10	67 tok/s	1.0M	$3.00/$15.00	Details
#59 V4 Flash (Reasoning, High Effort) by DeepSeek · Apr 24, 2026	37.5	61	N/A	1.0M	$0.14/$0.28	Details
#60 Grok Build 0.1 0616 by SpaceXAI · Jun 16, 2026	39.8	24	56 tok/s	256K	$1.00/$2.00	Details
#61 GPT-5.6 Luna (medium) by OpenAI · Jul 9, 2026	38.1	29	182 tok/s	1.1M	$1.00/$6.00	Details
#62 MiMo-V2-Flash (Non-reasoning) by Xiaomi · Dec 16, 2025	24.7	N/A	N/A	1.0M	N/A/N/A	Details
#63 GPT-5.1 (high) by OpenAI · Nov 13, 2025	36.9	20	110 tok/s	400K	$1.25/$10.00	Details
#64 Gemini 3.5 Flash-Lite by Google · Jul 21, 2026	36.5	45	600 tok/s	1.0M	$0.30/$2.50	Details
#65 Nemotron 3 Ultra 550B A55B (Reasoning) by NVIDIA · Jun 4, 2026	37.8	39	215 tok/s	512K	$0.68/$2.67	Details
#66 Qwen3.5 397B A17B (Reasoning) by Alibaba · Feb 16, 2026	33.7	20	61 tok/s	262K	$0.60/$3.60	Details
#67 Medium 3.5 by Mistral · Apr 29, 2026	29.9	19	135 tok/s	262K	$1.50/$7.50	Details
#68 K2.5 (Reasoning) by Kimi · Jan 27, 2026	35.4	22	50 tok/s	262K	$0.60/$3.00	Details
#69 Gemini 2.5 Pro Preview (Mar' 25) by Google · Mar 25, 2025	23.0	N/A	N/A	1.0M	N/A/N/A	Details
#70 Qwen3.6 27B (Non-reasoning) by Alibaba · Apr 22, 2026	30.5	18	60 tok/s	262K	$0.60/$3.60	Details
#71 GLM-5.2 (Non-reasoning) by Z AI · Jun 16, 2026	34.1	23	112 tok/s	1.0M	$1.40/$4.40	Details
#72 GLM-4.6 (Reasoning) by Z AI · Sep 30, 2025	28.7	20	46 tok/s	1.0M	$0.55/$2.20	Details
#73 Qwen3.5 122B A10B (Reasoning) by Alibaba · Feb 24, 2026	32.3	35	140 tok/s	262K	$0.40/$3.20	Details
#74 2.0 by LongCat · Jun 29, 2026	33.5	N/A	N/A	N/A	N/A/N/A	Details
#75 GLM-4.7 (Reasoning) by Z AI · Dec 22, 2025	33.7	35	117 tok/s	1.0M	$0.60/$2.20	Details
#76 GPT-5.6 Luna (low) by OpenAI · Jul 9, 2026	33.3	25	185 tok/s	1.1M	$1.00/$6.00	Details
#77 V3.2 (Reasoning) by DeepSeek · Dec 1, 2025	32.0	39	N/A	1.0M	$0.28/$0.42	Details
#78 Claude 4.5 Haiku (Reasoning) by Anthropic · Oct 15, 2025	29.6	24	151 tok/s	1.0M	$1.00/$5.00	Details
#79 V3.1 Terminus (Reasoning) by DeepSeek · Sep 22, 2025	30.4	15	N/A	1.0M	$1.64/$2.75	Details
#80 Gemma 4 31B (Reasoning) by Google · Apr 2, 2026	29.4	N/A	36 tok/s	262K	N/A/N/A	Details
#81 Qwen3.5 122B A10B (Non-reasoning) by Alibaba · Feb 24, 2026	27.6	30	147 tok/s	262K	$0.40/$3.20	Details
#82 Ring-2.6-1T by InclusionAI · May 8, 2026	30.6	37	129 tok/s	262K	$0.30/$2.50	Details
#83 Grok 4.3 (high) by SpaceXAI · Apr 30, 2026	37.6	34	123 tok/s	500K	$1.25/$2.50	Details
#84 Qwen3.6 35B A3B (Reasoning) by Alibaba · Apr 16, 2026	31.6	48	134 tok/s	262K	$0.25/$1.49	Details
#85 o1 by OpenAI · Dec 5, 2024	23.4	3	N/A	N/A	$15.00/$60.00	Details
#86 Step 3.7 Flash by StepFun · May 29, 2026	30.3	52	401 tok/s	256K	$0.20/$1.15	Details
#87 GPT-5.5 Instant (June 2026) by OpenAI · Jun 25, 2026	28.9	6	N/A	1.1M	$5.00/$30.00	Details
#88 GPT-5.6 Luna (Non-reasoning) by OpenAI · Jul 9, 2026	26.6	20	186 tok/s	1.1M	$1.00/$6.00	Details
#89 Gemma 4 26B A4B (Reasoning) by Google · Apr 2, 2026	25.7	39	N/A	262K	$0.13/$0.40	Details
#90 GPT-5 (high) by OpenAI · Aug 7, 2025	34.7	19	111 tok/s	400K	$1.25/$10.00	Details
#91 Nemotron 3 Super 120B A12B (Reasoning) by NVIDIA · Mar 11, 2026	25.4	46	172 tok/s	262K	$0.25/$0.78	Details
#92 Claude 4 Sonnet (Reasoning) by Anthropic · May 22, 2025	28.9	8	N/A	1.0M	$3.00/$15.00	Details
#93 Qwen3.5 35B A3B (Non-reasoning) by Alibaba · Feb 24, 2026	24.0	33	166 tok/s	262K	$0.25/$2.00	Details
#94 North Mini Code by Cohere · Jun 9, 2026	19.8	N/A	96 tok/s	N/A	N/A/N/A	Details
#95 Claude 3.7 Sonnet (Reasoning) by Anthropic · Feb 24, 2025	27.1	N/A	N/A	1.0M	N/A/N/A	Details
#96 Qwen3 Coder Next by Alibaba · Feb 3, 2026	21.1	24	96 tok/s	262K	$0.35/$1.20	Details
#97 Grok 4.3 (Non-reasoning) by SpaceXAI · Apr 30, 2026	24.8	17	93 tok/s	500K	$1.25/$2.50	Details
#98 Gemini 3.1 Flash-Lite by Google · Mar 3, 2026	25.0	38	315 tok/s	1.0M	$0.25/$1.50	Details
#99 Nova 2.0 Pro Preview (medium) by Amazon · Nov 27, 2025	21.8	13	134 tok/s	N/A	$1.25/$10.00	Details
#100 o1-preview by OpenAI · Sep 12, 2024	17.0	2	N/A	N/A	$16.50/$66.00	Details

Showing 100 of 578 models

Understanding the AI Model Leaderboard

This comprehensive AI model leaderboard helps you compare and choose the best large language models (LLMs) for your needs. We track standardized AI benchmarks, token pricing, inference speed, and model capabilities across all major AI providers like OpenAI, Anthropic, Google, Meta, and DeepSeek.

Core AI Benchmarks Explained

MMLU-ProTests broad knowledge across 14 academic subjects

GPQAPhD-level reasoning & problem-solving

AIME 2025Elite mathematical reasoning

Coding IndexLiveCodeBench + SciCode composite

Math IndexAIME + MATH-500 composite

Key Metrics to Consider

Token PricingInput vs output cost per 1M tokens

Inference SpeedTokens/sec for response time

Release DateLatest techniques & knowledge

Benchmark Scores0-100% capability comparison

How to Choose the Right AI Model for Your Use Case

For Research & Analysis

Prioritize models with high MMLU-Pro (70%+) and GPQA (60%+) scores for complex reasoning tasks, academic research, and technical documentation

For Cost Optimization

Sort by input/output pricing - smaller models often deliver 80% of flagship performance at 10% of the cost for simple tasks

For Math & STEM

Filter by Math Index or AIME 2025 scores (50%+) for quantitative analysis, engineering calculations, and scientific applications

All benchmark scores and pricing data are updated daily from Artificial Analysis to reflect the latest model versions and capabilities. Use the sort filters above to find AI models by intelligence, cost, coding ability, math performance, speed, or release date.

Frequently Asked Questions

What is MMLU-Pro and why is it the standard AI intelligence benchmark?

MMLU-Pro (Massive Multitask Language Understanding - Professional) is the most comprehensive AI benchmark, testing models across 14 academic subjects including mathematics, science, history, law, and ethics. Scores range from 46% (basic competency) to 87% (near-expert level). Models scoring above 75% demonstrate strong general intelligence suitable for professional applications, while scores below 60% indicate limitations in complex reasoning tasks.

What does GPQA measure and which models score highest?

GPQA (Graduate-level Google-Proof Q&A) tests PhD-level reasoning with questions designed to be "Google-proof" - requiring deep understanding rather than simple fact retrieval. Top models like GPT-5.1 (87.3%), GPT-5 mini (82.8%), and o3 (82.7%) excel at GPQA, making them ideal for research, technical analysis, and complex problem-solving. Models below 50% GPQA struggle with advanced reasoning and may provide superficial answers to complex questions.

What is AIME 2025 and how does it evaluate AI mathematical ability?

AIME 2025 (American Invitational Mathematics Examination) is an elite math competition benchmark that tests advanced problem-solving, algebra, geometry, and number theory. Scores above 80% (like GPT-5 Codex at 98.7% or GPT-5.1 at 94%) indicate exceptional mathematical reasoning suitable for engineering, scientific computing, and quantitative analysis. Models scoring below 50% may struggle with multi-step mathematical problems or require explicit problem breakdown.

How is AI model pricing calculated and what's considered cost-effective?

AI model pricing is measured per 1 million tokens (approximately 750,000 words). Input pricing covers text you send, while output pricing covers generated responses. Budget models like Llama 3.3 70B cost $0.54/$0.71 per million tokens, mid-tier models like GPT-5 nano cost $0.05/$0.40, while premium models like GPT-5 cost $1.25/$10. For typical applications with 3:1 input-to-output ratio, budget models can be 10-20x cheaper than flagship models while maintaining 70-80% performance.

Which AI models are best for coding and programming tasks?

Sort by Coding Index to see top programming models. Our Coding Index combines LiveCodeBench, SciCode, and coding benchmarks. Top performers include GPT-5.1 (57.5 index), GPT-5 mini (51.4), and GPT-5 Codex (53.5). These models excel at code generation, debugging, refactoring, and explaining complex algorithms. For budget-conscious developers, models with 40+ coding index scores offer excellent value for routine programming tasks.

How often are AI model benchmarks and rankings updated?

Our leaderboard syncs daily with Artificial Analysis API to ensure benchmark scores (MMLU-Pro, GPQA, AIME 2025), pricing, and inference speed data reflect the latest model versions. New model releases appear immediately under the "Newest" sort option. Benchmark scores can change when providers release updated versions - for example, GPT-5.1 released in November 2025 achieved 69.7 intelligence compared to GPT-5's 68.5 from August 2025.

What inference speed (tokens/second) do I need for my application?

Inference speed determines how fast models generate responses. For real-time chatbots and interactive applications, target 100+ tokens/second (models like gpt-oss-120B at 340 tok/s). For background processing and batch jobs, 50-100 tok/s is sufficient. Premium reasoning models like GPT-5 (103 tok/s) balance speed and capability. Note that higher inference speed doesn't always mean better quality - slower models often deliver more thoughtful, detailed responses.

Can I test these AI models for free before committing?

Yes! Try our free AI chat interface to test different models instantly without creating an account. Many providers also offer free tiers: OpenAI (ChatGPT with daily limits), Anthropic (Claude with usage caps), Google (Gemini free tier), and open-source models like Llama 3.3. Compare performance on your specific use case before upgrading to paid plans.