Available AI Models

AI Models on LLMBase

Compare models available in LLMBase Chat, through the Inference API, or both.

Popular AI Models

Models you can use directly in LLMBase Chat.

GLM 5.2

EU-Hosted

Z.ai

GLM 5.2 is a large-scale reasoning model from Z.ai. It supports text input and output with a 1M-token context window, and is suited for long-horizon agent workflows, project-level software engineering,...

Context1.0M

Speed156 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Kimi K2.7 Code

EU-Hosted

MoonshotAI

MoonshotAI: Kimi K2.7 Code is a coding-focused model in Moonshot AI's Kimi K2 family, built to complete end-to-end programming tasks reliably over long contexts. It uses a native multimodal mixture-of-experts...

Context262K

Speed43 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

M3

EU-Hosted

MiniMax

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...

Context1.0M

Speed105 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3.6 35B A3B

EU-Hosted

Qwen

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...

Context262K

Speed124 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

gpt-oss-120b

EU-Hosted

OpenAI

Open-weight MoE reasoning model for high-throughput agentic and general-purpose production workloads.

Context131K

Speed303 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

gpt-oss-20b

EU-Hosted

OpenAI

Open-weight compact MoE reasoning model for lower-latency agentic and general-purpose workloads.

Context131K

Speed267 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Nemo

EU-Hosted

Mistral

Efficient Mistral open model for multilingual chat, coding assistance, and low-cost text generation.

Context60K

SpeedN/A

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

V4 Flash

EU-Hosted

DeepSeek

Efficiency-focused DeepSeek V4 MoE model for high-throughput coding, reasoning, and agent workflows.

Context1.0M

Speed118 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Gemma 4 26B A4B

EU-Hosted

Google

Instruction-tuned Gemma 4 MoE model from Google DeepMind with long-context multimodal input and efficient sparse activation.

Context262K

Speed133 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Gemma 4 31B

EU-Hosted

Google

Lower-latency Gemma 4 31B serving profile for efficient long-context reasoning workloads.

Context262K

Speed48 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Kimi K2.6

EU-Hosted

MoonshotAI

Native multimodal Kimi model for long-horizon coding, autonomous execution, and agentic orchestration.

Context262K

Speed132 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Llama 4 Maverick

EU-Hosted

Qwen3.5 397B A17B

EU-Hosted

Qwen

Large multimodal MoE model with strong reasoning, code generation, and agent-style task performance.

Context262K

Speed60 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3 Coder 480B A35B

EU-Hosted

Qwen

Large sparse MoE coding model optimized for long-horizon code generation, agent loops, and development workflows.

Context262K

SpeedN/A

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3.5-9B

EU-Hosted

Qwen

Efficient multimodal Qwen3.5 model for reasoning, coding, and visual understanding in a compact footprint.

Context256K

Speed69 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3-VL-30B-A3B-Instruct

EU-Hosted

Qwen

Compact Qwen3 vision-language MoE model for OCR, visual understanding, multilingual chat, and tool-assisted multimodal workflows.

Context262K

Speed121 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3.5-122B-A10B

EU-Hosted

Qwen

High-end multimodal MoE model with strong text and visual reasoning, coding, and agent capabilities.

Context262K

Speed147 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Top Proprietary AI Models

Frontier closed models.

Claude Fable 5

Anthropic

Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and...

Context1.0M

Speed70 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Sol

NEW

Openai

GPT-5.6 Sol is the flagship model in OpenAI's GPT-5.6 series. It is suited for complex reasoning, coding, and agentic workflows, and is particularly strong at command-line and multi-step coding tasks...

Context1.1M

Speed73 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Opus 4.8

Anthropic

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...

Context1.0M

Speed66 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Terra

NEW

Openai

GPT-5.6 Terra is a balanced model in OpenAI's GPT-5.6 series, positioned between the flagship Sol tier and the cost-efficient Luna tier. It is suited for everyday coding, reasoning, and agentic...

Context1.1M

Speed174 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.5

Openai

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token...

Context1.1M

Speed79 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Grok 4.5

NEW

X Ai

Grok 4.5 is SpaceXAI's smartest model with frontier performance on coding, knowledge work, and STEM.

Context500K

Speed143 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Sonnet 5

Anthropic

Sonnet 5 is Anthropic's most capable Sonnet-class model, with frontier performance across coding, agents, and professional work. It supports adaptive thinking with selectable reasoning effort levels (low, medium, high, max,...

Context1.0M

Speed128 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.5 Pro

Openai

GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context window (922K input, 128K output) with support for...

Context1.1M

Speed110 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Image 2

Openai

It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...

Context272K

Speed155 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Luna

NEW

Openai

GPT-5.6 Luna is a fast, cost-efficient model in OpenAI's GPT-5.6 series. It is suited for high-volume, latency-sensitive tasks such as chat, classification, and lightweight agentic workflows, providing capable reasoning for...

Context1.1M

Speed264 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.5 Flash

Google

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tier cost and speed. It is highly optimized for coding proficiency and parallel agentic execution...

Context1.0M

Speed259 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.1 Pro Preview

Google

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

Context1.0M

Speed137 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Max

Qwen

Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...

Context1.0M

Speed203 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.3-Codex

Openai

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

Context400K

Speed196 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Mini

Openai

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...

Context400K

Speed176 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 Max Preview

Qwen

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...

Context262K

Speed38 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 Plus

Qwen

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

Context1.0M

Speed53 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana Pro (Gemini 3 Pro Image)

Google

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...

Context66K

Speed141 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Plus

Qwen

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

Context1.0M

Speed188 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Nano

Openai

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...

Context400K

Speed168 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

M2.7

Minimax

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...

Context205K

Speed54 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.1 Flash Lite

Google

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...

Context1.0M

Speed305 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 27B

Qwen

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...

Context262K

Speed59 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana 2 (Gemini 3.1 Flash Image)

Google

Gemini 3.1 Flash Image, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced...

Context131K

Speed215 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Top Coding AI Models

Models tuned for code and developer workflows.

GPT-5.6 Sol

NEW

Openai

Context1.1M

Speed73 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Terra

NEW

Openai

GPT-5.6 Terra is a balanced model in OpenAI's GPT-5.6 series, positioned between the flagship Sol tier and the cost-efficient Luna tier. It is suited for everyday coding, reasoning, and agentic...

Context1.1M

Speed174 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Fable 5

Anthropic

Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and...

Context1.0M

Speed70 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.5

Openai

Context1.1M

Speed79 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Opus 4.8

Anthropic

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...

Context1.0M

Speed66 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Grok 4.5

NEW

X Ai

Grok 4.5 is SpaceXAI's smartest model with frontier performance on coding, knowledge work, and STEM.

Context500K

Speed143 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.5 Pro

Openai

Context1.1M

Speed110 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Sonnet 5

Anthropic

Context1.0M

Speed128 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Luna

NEW

Openai

Context1.1M

Speed264 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Image 2

Openai

It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...

Context272K

Speed155 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.5 Flash

Google

Context1.0M

Speed259 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.1 Pro Preview

Google

Context1.0M

Speed137 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GLM 5.2

EU-Hosted

Z.ai

Context1.0M

Speed156 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3.7 Max

Qwen

Context1.0M

Speed203 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Kimi K2.6

EU-Hosted

MoonshotAI

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and...

Context262K

Speed132 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Kimi K2.7 Code

EU-Hosted

MoonshotAI

Context262K

Speed43 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

V4 Pro

EU-Hosted

DeepSeek

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...

Context1.0M

Speed67 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

M3

EU-Hosted

MiniMax

Context1.0M

Speed105 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

GPT-5.4 Mini

Openai

Context400K

Speed176 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Nano

Openai

Context400K

Speed168 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Plus

Qwen

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

Context1.0M

Speed188 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 Plus

Qwen

Context1.0M

Speed53 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 27B

Qwen

Context262K

Speed59 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

M2.7

Minimax

Context205K

Speed54 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Top Math AI Models

Math and reasoning specialists.

GPT-5.5 Pro

Openai

Context1.1M

Speed110 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.3-Codex

Openai

Context400K

Speed196 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.1 Flash Lite

Google

Context1.0M

Speed305 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

V4 Pro

EU-Hosted

DeepSeek

Context1.0M

Speed67 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Nano Banana Pro (Gemini 3 Pro Image)

Google

Context66K

Speed141 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

GLM 5.2

EU-Hosted

Z.ai

Context1.0M

Speed156 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Kimi K2.6

EU-Hosted

MoonshotAI

Context262K

Speed132 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

gpt-oss-120b

EU-Hosted

OpenAI

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Context131K

Speed303 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Grok 4.5

NEW

X Ai

Grok 4.5 is SpaceXAI's smartest model with frontier performance on coding, knowledge work, and STEM.

Context500K

Speed143 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Luna

NEW

Openai

Context1.1M

Speed264 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Opus 4.8

Anthropic

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...

Context1.0M

Speed66 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Plus

Qwen

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

Context1.0M

Speed188 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Mini

Openai

Context400K

Speed176 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

gpt-oss-20b

EU-Hosted

OpenAI

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Context131K

Speed267 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Claude Sonnet 5

Anthropic

Context1.0M

Speed128 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Nano

Openai

Context400K

Speed168 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

M2.7

Minimax

Context205K

Speed54 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3 VL 30B A3B Instruct

EU-Hosted

Qwen

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

Context262K

Speed121 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3 Max Thinking

Qwen

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...

Context262K

Speed55 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Llama 4 Maverick

EU-Hosted

Gemini 3.5 Flash

Google

Context1.0M

Speed259 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

M3

EU-Hosted

MiniMax

Context1.0M

Speed105 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Kimi K2.7 Code

EU-Hosted

MoonshotAI

Context262K

Speed43 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

GPT-5.3 Chat

Openai

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

Context128K

Speed162 tok/s

InputText, Image, File

OutputText

ReasoningNo

AvailabilityChat only

Use in Chat

Fast AI Models

Lowest cost + latency options.

Gemini 3.1 Flash Lite

Google

Context1.0M

Speed305 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

gpt-oss-120b

EU-Hosted

OpenAI

Context131K

Speed303 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Llama 4 Maverick

EU-Hosted

gpt-oss-20b

EU-Hosted

OpenAI

Context131K

Speed267 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

GPT-5.6 Luna

NEW

Openai

Context1.1M

Speed264 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.5 Flash

Google

Context1.0M

Speed259 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.5-Flash

Qwen

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

Context1.0M

Speed252 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana 2 (Gemini 3.1 Flash Image)

Google

Gemini 3.1 Flash Image, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced...

Context131K

Speed215 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Max

Qwen

Context1.0M

Speed203 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.3-Codex

Openai

Context400K

Speed196 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT Audio Mini

Openai

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...

Context128K

Speed194 tok/s

InputText, File

OutputText, Audio

ReasoningNo

AvailabilityChat only

Use in Chat

Qwen3.7 Plus

Qwen

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

Context1.0M

Speed188 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Grok 4.5

NEW

X Ai

Grok 4.5 is SpaceXAI's smartest model with frontier performance on coding, knowledge work, and STEM.

Context500K

Speed143 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Mistral: Mistral Small 4

Mistralai

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

Context262K

Speed178 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Mini

Openai

Context400K

Speed176 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Terra

NEW

Openai

GPT-5.6 Terra is a balanced model in OpenAI's GPT-5.6 series, positioned between the flagship Sol tier and the cost-efficient Luna tier. It is suited for everyday coding, reasoning, and agentic...

Context1.1M

Speed174 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.5-35B-A3B

Qwen

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

Context262K

Speed172 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Nano

Openai

Context400K

Speed168 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.3 Chat

Openai

Context128K

Speed162 tok/s

InputText, Image, File

OutputText

ReasoningNo

AvailabilityChat only

Use in Chat

GLM 5.2

EU-Hosted

Z.ai

Context1.0M

Speed156 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

GPT-5.4 Image 2

Openai

It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...

Context272K

Speed155 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.5-122B-A10B

EU-Hosted

Qwen

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

Context262K

Speed147 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

Nano Banana Pro (Gemini 3 Pro Image)

Google

Context66K

Speed141 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.1 Pro Preview

Google

Context1.0M

Speed137 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Top Audio AI Models

Models with voice and audio output capabilities.

GPT Audio

Openai

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced...

Context128K

SpeedN/A

InputText, File

OutputText, Audio

ReasoningNo

AvailabilityChat only

Use in Chat

GPT Audio Mini

Openai

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...

Context128K

Speed194 tok/s

InputText, File

OutputText, Audio

ReasoningNo

AvailabilityChat only

Use in Chat

By Plan: Free

Models available on the Free plan.

Mistral: Mistral Medium 3.5

Mistralai

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Context262K

Speed93 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Mistral: Mistral Small 4

Mistralai

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

Context262K

Speed178 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Mistral Nemo

EU-Hosted

Mistral

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...

Context60K

SpeedN/A

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

By Plan: Starter

Models that require at least Starter.

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image)

Google

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) is Google's fastest, most cost-efficient Gemini image model, built for high-velocity developer pipelines and rapid-fire visual exploration. It delivers text-to-image generation...

Context66K

SpeedN/A

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

M2.7

Minimax

Context205K

Speed54 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Nano

Openai

Context400K

Speed168 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Mini

Openai

Context400K

Speed176 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.5-27B

Qwen

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

Context262K

Speed86 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.5-122B-A10B

EU-Hosted

Qwen

Context262K

Speed147 tok/s

InputText, Image

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

GPT-5.3-Codex

Openai

Context400K

Speed196 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3 Max Thinking

Qwen

Context262K

Speed55 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT Audio

Openai

Context128K

SpeedN/A

InputText, File

OutputText, Audio

ReasoningNo

AvailabilityChat only

Use in Chat

GPT Audio Mini

Openai

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...

Context128K

Speed194 tok/s

InputText, File

OutputText, Audio

ReasoningNo

AvailabilityChat only

Use in Chat

gpt-oss-120b

EU-Hosted

OpenAI

Context131K

Speed303 tok/s

InputText

OutputText

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

By Plan: Pro+

Models for Pro, Expert and Max plans.

GPT-5.6 Luna Pro

NEW

Openai

Learn more in OpenAI's docs: https://developers.openai.com/api/docs/guides/reasoning#reasoning-mode

Context1.1M

SpeedN/A

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Luna

NEW

Openai

Context1.1M

Speed264 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Terra Pro

NEW

Openai

Learn more in OpenAI's docs: https://developers.openai.com/api/docs/guides/reasoning#reasoning-mode

Context1.1M

SpeedN/A

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Terra

NEW

Openai

GPT-5.6 Terra is a balanced model in OpenAI's GPT-5.6 series, positioned between the flagship Sol tier and the cost-efficient Luna tier. It is suited for everyday coding, reasoning, and agentic...

Context1.1M

Speed174 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Sol Pro

NEW

Openai

Learn more in OpenAI's docs: https://developers.openai.com/api/docs/guides/reasoning#reasoning-mode

Context1.1M

SpeedN/A

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.6 Sol

NEW

Openai

Context1.1M

Speed73 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Sonnet 5

Anthropic

Context1.0M

Speed128 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana 2 (Gemini 3.1 Flash Image)

Google

Gemini 3.1 Flash Image, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced...

Context131K

Speed215 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana Pro (Gemini 3 Pro Image)

Google

Context66K

Speed141 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

GLM 5.2

EU-Hosted

Z.ai

Context1.0M

Speed156 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Kimi K2.7 Code

EU-Hosted

MoonshotAI

Context262K

Speed43 tok/s

InputText

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Claude Fable 5

Anthropic

Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and...

Context1.0M

Speed70 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Plus

Qwen

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

Context1.0M

Speed188 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

M3

EU-Hosted

MiniMax

Context1.0M

Speed105 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Claude Opus 4.8 (Fast)

Anthropic

Fast-mode variant of [Opus 4.8](/anthropic/claude-opus-4.8) - identical capabilities with higher output speed at 2x pricing relative to regular Opus 4.8. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

Context1.0M

SpeedN/A

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Claude Opus 4.8

Anthropic

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...

Context1.0M

Speed66 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.7 Max

Qwen

Context1.0M

Speed203 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.5 Flash

Google

Context1.0M

Speed259 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Gemini 3.1 Flash Lite

Google

Context1.0M

Speed305 tok/s

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.5 Plus 2026-04-20

Qwen

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...

Context1.0M

SpeedN/A

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 Flash

Qwen

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...

Context1.0M

SpeedN/A

InputText, Image, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Qwen3.6 35B A3B

EU-Hosted

Qwen

Context262K

Speed124 tok/s

InputText, Image

OutputText

ReasoningYes

AvailabilityChat, Inference API

Use in Chat Use via API

Qwen3.6 Max Preview

Qwen

Context262K

Speed38 tok/s

InputText, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.5 Pro

Openai

Context1.1M

Speed110 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.5

Openai

Context1.1M

Speed79 tok/s

InputImage, Text, File

OutputText

ReasoningYes

AvailabilityChat only

Use in Chat

Top Image Generation AI Models

Models that generate images from text prompts.

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image)

Google

Context66K

SpeedN/A

InputImage, Text

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana 2 (Gemini 3.1 Flash Image)

Google

Gemini 3.1 Flash Image, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced...

Context131K

Speed215 tok/s

InputImage, Text

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

Nano Banana Pro (Gemini 3 Pro Image)

Google

Context66K

Speed141 tok/s

InputImage, Text

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

GPT-5.4 Image 2

Openai

It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...

Context272K

Speed155 tok/s

InputImage, Text, File

OutputImage, Text

ReasoningYes

AvailabilityChat only

Use in Chat

FLUX 2 Klein

EU-Hosted

Black Forest Labs

Fast Black Forest Labs image generation and editing model.

ContextN/A

SpeedN/A

InputText, Image

OutputImage

ReasoningNo

AvailabilityChat, Inference API

Use in Chat Use via API

AI chat subscription

Turn model research into daily AI work.

Use 40+ models, web search, files, and EU-hosted options in one paid chat workspace.

Start chat View plans

Inference credits

Build with EU-hosted open-source models.

OpenAI-compatible API for GLM, Kimi, DeepSeek and more. Add credits inside the dashboard.

Get API access Add credits

How to Choose the Right AI Model

A practical guide to picking the best LLM for your use case.

Match the model to the task

General-purpose models handle most tasks well. For specialized work, coding models and math models often outperform generalists on their respective benchmarks while costing less per token.

Consider context window size

If you work with long documents, codebases, or multi-turn conversations, context window matters. Models range from 8K to over 1M tokens. Larger windows let you process entire books or repositories in a single prompt, but they increase cost and latency.

Balance cost, speed, and quality

Frontier models deliver the highest benchmark scores but cost more per token and respond slower. Faster smaller models can handle routine tasks at a fraction of the cost with lower latency, which is often better for high-volume applications.

Open source vs. proprietary

Open-source models let you self-host, fine-tune, and inspect weights. Proprietary models often lead on benchmarks and offer managed APIs with built-in safety features. Many teams use both: proprietary for peak performance, open source for cost control and customization.

Check for multimodal capabilities

Some models accept images, audio, or files alongside text. If your workflow involves analyzing screenshots, diagrams, or audio transcriptions, filter for models with vision or audio input support. Models with structured output and function calling are essential for building agents and tool-using applications.

Use benchmarks as a starting point

Scores like GPQA, MMLU Pro, and HLE measure academic knowledge and reasoning. LiveCodeBench and SciCode test practical coding ability. MATH 500 and AIME evaluate mathematical problem-solving. No single benchmark tells the full story - compare scores across categories relevant to your use case, then test with your own prompts.

Model catalog, pricing, speed, and benchmark scores are updated regularly. Chat and API access follows each model, plan and tier.