Qwen3 235B A22B 2507 Instruct vs Qwen3 235B A22B 2507 (Reasoning)
Comparing 2 AI models · 6 benchmarks · Alibaba
Most Affordable
Al
Qwen3 235B A22B 2507 Instruct
$0.70/1M
Highest Intelligence
Al
Qwen3 235B A22B 2507 (Reasoning)
79.0% GPQA
Best for Coding
Al
Qwen3 235B A22B 2507 (Reasoning)
44.6 Coding Index
Price Difference
1.0x
input cost range
Composite Indices
Intelligence, Coding, Math
Standard Benchmarks
Academic and industry benchmarks
Benchmark Winners
6 tests
Al
Qwen3 235B A22B 2507 Instruct
0
No clear wins
Al
Qwen3 235B A22B 2507 (Reasoning)
6
- GPQA
- MMLU Pro
- HLE
- LiveCodeBench
- MATH 500
- AIME 2025
| Metric | Al Qwen3 235B A22B 2507 Instruct | Al Qwen3 235B A22B 2507 (Reasoning) |
|---|---|---|
| Pricing Per 1M tokens | ||
| Input Cost | $0.70/1M | $0.70/1M |
| Output Cost | $2.80/1M | $8.40/1M |
| Blended Cost 3:1 input/output ratio | $1.22/1M | $2.63/1M |
| Specifications | ||
| Organization Model creator | Alibaba | Alibaba |
| Release Date Launch date | Jul 21, 2025 | Jul 25, 2025 |
| Performance & Speed | ||
| Throughput Output speed | 39.3 tok/s | 82.6 tok/s |
| Time to First Token (TTFT) Initial response delay | 1254ms | 1136ms |
| Latency Time to first answer token | 1254ms | 25354ms |
| Composite Indices | ||
| Intelligence Index Overall reasoning capability | 45.3 | 57.5 |
| Coding Index Programming ability | 34.2 | 44.6 |
| Math Index Mathematical reasoning | 71.7 | 91.0 |
| Standard Benchmarks | ||
| GPQA Graduate-level reasoning | 75.3% | 79.0% |
| MMLU Pro Advanced knowledge | 82.8% | 84.3% |
| HLE Hard language evaluation | 10.6% | 15.0% |
| LiveCodeBench Real-world coding tasks | 52.4% | 78.8% |
| MATH 500 Mathematical problems | 98.0% | 98.4% |
| AIME 2025 Advanced math competition | 71.7% | 91.0% |
| AIME (Original) Math olympiad problems | 71.7% | 94.0% |
| SciCode Scientific code generation | 36.0% | 42.4% |
| LCR Code review capability | 31.2% | 67.0% |
| IFBench Instruction-following | 46.1% | 51.2% |
| TAU-bench v2 Tool use & agentic tasks | 33.3% | 53.2% |
| TerminalBench Hard CLI command generation | 14.2% | 12.8% |
Key Takeaways
Qwen3 235B A22B 2507 Instruct offers the best value at $0.70/1M, making it ideal for high-volume applications and cost-conscious projects.
Qwen3 235B A22B 2507 (Reasoning) leads in reasoning capabilities with a