Qwen2.5 VL 32B Instruct
32Bby Qwen
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.
Capabilities
Pricing
Supported Modalities
Input
Output
Specifications
- Context Length
- 16K tokens
- Provider
- Qwen
- Released
- Mar 24, 2025
- Model ID
- qwen/qwen2.5-vl-32b-instruct
Ready to try it?
Start chatting with Qwen2.5 VL 32B Instruct right now. No credit card required.
Start ChattingMore from Qwen
View all modelsCompare Models
Select a model to compare with Qwen2.5 VL 32B Instruct