DeepSeek V3.1 (Non-reasoning) vs Grok 4

Comparing 2 AI models · 6 benchmarks · DeepSeek, xAI

Most Affordable

DeepSeek V3.1 (Non-reasoning)

$0.56/1M

Highest Intelligence

Grok 4

87.7% GPQA

Best for Coding

Grok 4

55.1 Coding Index

Price Difference

5.4x

input cost range

Composite Indices

Intelligence, Coding, Math

Academic and industry benchmarks

6 tests

No clear wins

Metric	De DeepSeek V3.1 (Non-reasoning) DeepSeek	xA Grok 4 xAI
Pricing Per 1M tokens
Input Cost	$0.56/1M	$3.00/1M
Output Cost	$1.66/1M	$15.00/1M
Blended Cost 3:1 input/output ratio	$0.83/1M	$6.00/1M
Specifications
Organization Model creator	DeepSeek	xAI
Release Date Launch date	Aug 21, 2025	Jul 10, 2025
Performance & Speed
Throughput Output speed	—	37.2 tok/s
Time to First Token (TTFT) Initial response delay	—	9172ms
Latency Time to first answer token	—	9172ms
Composite Indices
Intelligence Index Overall reasoning capability	44.8	65.3
Coding Index Programming ability	39.0	55.1
Math Index Mathematical reasoning	49.7	92.7
Standard Benchmarks
GPQA Graduate-level reasoning	73.5%	87.7%
MMLU Pro Advanced knowledge	83.3%	86.6%
HLE Hard language evaluation	6.3%	23.9%
LiveCodeBench Real-world coding tasks	57.7%	81.9%
MATH 500 Mathematical problems	—	99.0%
AIME 2025 Advanced math competition	49.7%	92.7%
AIME (Original) Math olympiad problems	—	94.3%
SciCode Scientific code generation	36.7%	45.7%
LCR Code review capability	45.0%	68.0%
IFBench Instruction-following	37.8%	53.7%
TAU-bench v2 Tool use & agentic tasks	34.8%	74.9%
TerminalBench Hard CLI command generation	22.7%	37.6%