MiMo-V2-Flash (Non-reasoning) vs gpt-oss-120B (high)
Comparing 2 AI models · 5 benchmarks · Xiaomi, OpenAI
Composite Indices
Intelligence, Coding, Math
Standard Benchmarks
Academic and industry benchmarks
Benchmark Winners
MiMo-V2-Flash (Non-reasoning)
No clear wins
gpt-oss-120B (high)
- GPQA
- MMLU Pro
- HLE
- LiveCodeBench
- AIME 2025
| Metric | Xi MiMo-V2-Flash (Non-reasoning) | Op gpt-oss-120B (high) |
|---|---|---|
| Pricing Per 1M tokens | ||
| Input Cost | $0.10/1M | $0.15/1M |
| Output Cost | $0.30/1M | $0.60/1M |
| Blended Cost 3:1 input/output ratio | $0.15/1M | $0.26/1M |
| Specifications | ||
| Organization Model creator | Xiaomi | OpenAI |
| Release Date Launch date | Dec 16, 2025 | Aug 5, 2025 |
| Performance & Speed | ||
| Throughput Output speed | 95.7 tok/s | 345.1 tok/s |
| Time to First Token (TTFT) Initial response delay | 1460ms | 418ms |
| Latency Time to first answer token | 1460ms | 6214ms |
| Composite Indices | ||
| Intelligence Index Overall reasoning capability | 30.1 | 32.9 |
| Coding Index Programming ability | 24.7 | 27.6 |
| Math Index Mathematical reasoning | 67.7 | 93.4 |
| Standard Benchmarks | ||
| GPQA Graduate-level reasoning | 65.6% | 78.2% |
| MMLU Pro Advanced knowledge | 74.4% | 80.8% |
| HLE Hard language evaluation | 8.0% | 18.5% |
| LiveCodeBench Real-world coding tasks | 40.2% | 87.8% |
| MATH 500 Mathematical problems | — | — |
| AIME 2025 Advanced math competition | 67.7% | 93.4% |
| AIME (Original) Math olympiad problems | — | — |
| SciCode Scientific code generation | 25.9% | 38.9% |
| LCR Code review capability | 31.3% | 50.7% |
| IFBench Instruction-following | 39.9% | 69.0% |
| TAU-bench v2 Tool use & agentic tasks | 83.9% | 65.8% |
| TerminalBench Hard CLI command generation | 24.1% | 22.0% |
Key Takeaways
MiMo-V2-Flash (Non-reasoning) offers the best value at $0.10/1M, making it ideal for high-volume applications and cost-conscious projects.
gpt-oss-120B (high) leads in reasoning capabilities with a 78.2% GPQA score, excelling at complex analytical tasks and problem-solving.
gpt-oss-120B (high) achieves a 27.6 coding index, making it the top choice for software development and code generation tasks.
All models support context windows of ∞+ tokens, suitable for processing lengthy documents and maintaining extended conversations.