Claude 4 Sonnet (Reasoning) vs Claude 4.5 Sonnet (Reasoning)

Comparing 2 AI models · 6 benchmarks · Anthropic

Most Affordable

Claude 4 Sonnet (Reasoning)

$3.00/1M

Highest Intelligence

Claude 4.5 Sonnet (Reasoning)

83.4% GPQA

Best for Coding

Claude 4.5 Sonnet (Reasoning)

37.1 Coding Index

Price Difference

1.0x

input cost range

Composite Indices

Intelligence, Coding, Math

Academic and industry benchmarks

6 tests

Metric	An Claude 4 Sonnet (Reasoning) Anthropic	An Claude 4.5 Sonnet (Reasoning) Anthropic
Pricing Per 1M tokens
Input Cost	$3.00/1M	$3.00/1M
Output Cost	$15.00/1M	$15.00/1M
Blended Cost 3:1 input/output ratio	$6.00/1M	$6.00/1M
Specifications
Organization Model creator	Anthropic	Anthropic
Release Date Launch date	May 22, 2025	Sep 29, 2025
Performance & Speed
Throughput Output speed	69.9 tok/s	80.6 tok/s
Time to First Token (TTFT) Initial response delay	753ms	1666ms
Latency Time to first answer token	29381ms	26482ms
Composite Indices
Intelligence Index Overall reasoning capability	38.4	42.4
Coding Index Programming ability	33.2	37.1
Math Index Mathematical reasoning	74.3	88.0
Standard Benchmarks
GPQA Graduate-level reasoning	77.7%	83.4%
MMLU Pro Advanced knowledge	84.2%	87.5%
HLE Hard language evaluation	9.6%	17.3%
LiveCodeBench Real-world coding tasks	65.5%	71.4%
MATH 500 Mathematical problems	99.1%	—
AIME 2025 Advanced math competition	74.3%	88.0%
AIME (Original) Math olympiad problems	77.3%	—
SciCode Scientific code generation	40.0%	44.7%
LCR Code review capability	64.7%	65.7%
IFBench Instruction-following	54.7%	57.3%
TAU-bench v2 Tool use & agentic tasks	64.6%	78.1%
TerminalBench Hard CLI command generation	29.8%	33.3%

Claude 4 Sonnet (Reasoning) offers the best value at $3.00/1M, making it ideal for high-volume applications and cost-conscious projects.

Claude 4.5 Sonnet (Reasoning) leads in reasoning capabilities with a 83.4% GPQA score, excelling at complex analytical tasks and problem-solving.

Claude 4.5 Sonnet (Reasoning) achieves a 37.1