Model Comparison

Claude 3.7 Sonnet (Non-reasoning)
vs. Grok 3

Comparing 2 AI models · 12 benchmarks · Anthropic, xAI

Recommended Pick

Grok 3 11 metric wins

Strongest on: Value, Reasoning, Math

Best Value

Grok 3

100.0 value score

47.9 reasoning / $8.00/1M

Lowest Price

Claude 3.7 Sonnet (Non-reasoning)

$3.00/1M input price

Best Reasoning

Grok 3

47.9 reasoning score

Blends available reasoning benchmarks

Best for Coding

Claude 3.7 Sonnet (Non-reasoning)

26.7 coding index

Composite Indices

Higher is better; speed and price are normalized

Standard Benchmarks

Only benchmarks with data are shown

Differences That Matter

Best value

Grok 3 has the strongest quality-to-price mix at 100.0 out of 100 value points.

Price gap

Claude 3.7 Sonnet (Non-reasoning) is 1.3x cheaper on input tokens than Grok 3.

Reasoning gap

Grok 3 leads Claude 3.7 Sonnet (Non-reasoning) by 12.2 points on reasoning.

Coding gap

Claude 3.7 Sonnet (Non-reasoning) leads Grok 3 by 6.9 points on coding.

Top-pick rationale

Grok 3 wins 11 measurable categories, including Value, Reasoning, Math, GPQA.

Live compare

Response Face-Off

Run one prompt through the selected models and compare response quality with live speed and cost context.

Claude 3.7 Sonnet (Non-reasoning)

Anthropic

Waiting

TTFT

—

Time

—

tok/s

—

Tokens

—

Cost

—

Waiting

Grok 3

xAI

Waiting

TTFT

—

Time

—

tok/s

—

Tokens

—

Cost

—

Waiting

Which answer was more useful?

AI Chat

Chat with 80+ models

Chat for free

Inference API

EU-hosted inference

Get API access

Full Comparison

Metric	An Claude 3.7 Sonnet (Non-reasoning) Anthropic	Top Pick xA Grok 3 xAI
Pricing per 1M tokens
Input Cost	$3.00/1M	$4.00/1M
Output Cost	$15.00/1M	$20.00/1M
Blended (3:1)	$6.00/1M	$8.00/1M
Specifications
Organization	Anthropic	xAI
Release Date	Feb 24, 2025	Feb 19, 2025
Performance & Speed
Throughput	—	—
TTFT	—	—
Latency	—	—
Composite Indices
Value Score	99.5	100.0
Reasoning Score	35.8	47.9
Intelligence	30.8	25.2
Coding	26.7	19.8
Math	21.0	58.0
Standard Benchmarks
GPQA	65.6%	69.3%
MMLU Pro	80.3%	79.9%
HLE	4.8%	5.1%
LiveCodeBench	39.4%	42.5%
MATH 500	85.0%	87.0%
AIME 2025	21.0%	58.0%
AIME (Original)	22.3%	33.0%
SciCode	37.6%	36.8%
LCR	48.3%	54.7%
IFBench	44.0%	46.9%
TAU-bench v2	50.0%	48.8%
TerminalBench Hard	21.2%	11.4%