Qwen logo

Qwen3 VL 30B A3B Instruct

30B

by Qwen

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

Chat with Qwen3 VL 30B A3B Instruct

Capabilities

Vision

Pricing

Input Tokens
Per 1M tokens
Free
Output Tokens
Per 1M tokens
Free
Image Processing
Per 1M tokens
$0.00/1M tokens

Supported Modalities

Input

text
image

Output

text

Performance Benchmarks

Intelligence Index
Overall intelligence score
38.5
Coding Index
Programming capability
28.0
Math Index
Mathematical reasoning
72.3
GPQA
Graduate-level questions
69.5%
MMLU Pro
Multitask language understanding
76.4%
HLE
Human-like evaluation
6.4%
LiveCodeBench
Real-world coding tasks
47.6%
AIME 2025
Advanced mathematics
72.3%

Specifications

Context Length
262K tokens
Provider
Qwen
Throughput
94.884 tokens/s
Released
Oct 6, 2025
Model ID
qwen/qwen3-vl-30b-a3b-instruct

Ready to try it?

Start chatting with Qwen3 VL 30B A3B Instruct right now. No credit card required.

Start Chatting

More from Qwen

View all models