Nemotron Nano 12B 2 VL: Pricing, Context Window & Benchmarks

Name: Nemotron Nano 12B 2 VL
Brand: NVIDIA
Price: 0.2 USD

12B

by NVIDIA

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

Chat with Nemotron Nano 12B 2 VL

Input Price

$0.20/1M tokens

Output Price

$0.60/1M tokens

Intelligence

10.1

Coding

5.9

What you can do with Nemotron Nano 12B 2 VL

Everyday Q&A and clear explanations

Writing help (emails, posts, summaries)

Idea generation and brainstorming

Learning support with step-by-step guidance

Composite Indices

Intelligence, Coding, Math

Standard Benchmarks

Academic and industry benchmarks

Benchmark Highlights

5 tests

GPQA

43.9%

MMLU Pro

64.9%

LiveCodeBench

34.5%

Math 500

N/A

AIME 2025

26.7%

HLE

4.5%

Metric	Value
Provider	NVIDIA
Context Window	131,072 tokens
Input Price	$0.20/1M tokens
Output Price	$0.60/1M tokens
Release Date	Oct 28, 2025
Modalities	image, text, video
Capabilities	Vision