Z.AI logo

GLM 4.6V

by Z.AI

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Chat with GLM 4.6V

Capabilities

Vision

Pricing

Input Tokens
Per 1M tokens
Free
Output Tokens
Per 1M tokens
Free
Image Processing
Per 1M tokens
$0.00/1M tokens

Supported Modalities

Input

image
text
video

Output

text

Specifications

Context Length
131K tokens
Provider
Z.AI
Released
Dec 8, 2025
Model ID
z-ai/glm-4.6v

Ready to try it?

Start chatting with GLM 4.6V right now. No credit card required.

Start Chatting

More from Z.AI

View all models