Llama Nemotron Super 49B v1.5 (Reasoning) vs gpt-oss-120B (high)
Comparing 2 AI models · 6 benchmarks · NVIDIA, OpenAI
Most Affordable
NV
Llama Nemotron Super 49B v1.5 (Reasoning)
$0.10/1M
Highest Intelligence
Op
gpt-oss-120B (high)
78.2% GPQA
Best for Coding
Op
gpt-oss-120B (high)
49.6 Coding Index
Price Difference
1.5x
input cost range
Composite Indices
Intelligence, Coding, Math
Standard Benchmarks
Academic and industry benchmarks
Benchmark Winners
6 tests
NV
Llama Nemotron Super 49B v1.5 (Reasoning)
2
- MMLU Pro
- MATH 500
Op
gpt-oss-120B (high)
4
- GPQA
- HLE
- LiveCodeBench
- AIME 2025
| Metric | NV Llama Nemotron Super 49B v1.5 (Reasoning) | Op gpt-oss-120B (high) |
|---|---|---|
| Pricing Per 1M tokens | ||
| Input Cost | $0.10/1M | $0.15/1M |
| Output Cost | $0.40/1M | $0.60/1M |
| Blended Cost 3:1 input/output ratio | $0.18/1M | $0.26/1M |
| Specifications | ||
| Organization Model creator | NVIDIA | OpenAI |
| Release Date Launch date | Jul 25, 2025 | Aug 5, 2025 |
| Performance & Speed | ||
| Throughput Output speed | 75.8 tok/s | 332.1 tok/s |
| Time to First Token (TTFT) Initial response delay | 223ms | 576ms |
| Latency Time to first answer token | 26596ms | 6599ms |
| Composite Indices | ||
| Intelligence Index Overall reasoning capability | 45.2 | 60.5 |
| Coding Index Programming ability | 37.8 | 49.6 |
| Math Index Mathematical reasoning | 76.7 | 93.4 |
| Standard Benchmarks | ||
| GPQA Graduate-level reasoning | 74.8% | 78.2% |
| MMLU Pro Advanced knowledge | 81.4% | 80.8% |
| HLE Hard language evaluation | 6.8% | 18.5% |
| LiveCodeBench Real-world coding tasks | 73.7% | 87.8% |
| MATH 500 Mathematical problems | 98.3% | — |
| AIME 2025 Advanced math competition | 76.7% | 93.4% |
| AIME (Original) Math olympiad problems | 86.0% | — |
| SciCode Scientific code generation | 34.8% | 38.9% |
| LCR Code review capability | 34.0% | 50.7% |
| IFBench Instruction-following | 37.0% | 69.0% |
| TAU-bench v2 Tool use & agentic tasks | 28.1% | 65.8% |
| TerminalBench Hard CLI command generation | 5.0% | 22.0% |
Key Takeaways
Llama Nemotron Super 49B v1.5 (Reasoning) offers the best value at $0.10/1M, making it ideal for high-volume applications and cost-conscious projects.
gpt-oss-120B (high) leads in reasoning capabilities with a 78.2% GPQA score, excelling at complex analytical tasks and problem-solving.
gpt-oss-120B (high) achieves a 49.6 coding index, making it the top choice for software development and code generation tasks.