Llama 3.2 90B Vision Instruct

90B

by Meta

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Chat with Llama 3.2 90B Vision Instruct

Capabilities

Vision

Pricing

Input Tokens

Per 1M tokens

Free

Output Tokens

Per 1M tokens

Free

Image Processing

Per 1M tokens

$505.80/1M tokens

Supported Modalities

Input

text

image

Output

text

Specifications

Context Length: 33K tokens
Provider: Meta
Released: Sep 25, 2024
Model ID: meta-llama/llama-3.2-90b-vision-instruct

Ready to try it?

Start chatting with Llama 3.2 90B Vision Instruct right now. No credit card required.

Start Chatting

More from Meta

View all models

Llama 3.2 90B Vision Instruct

Capabilities

Pricing

Supported Modalities

Input

Output

Specifications

Ready to try it?

More from Meta

Compare Models