UI-TARS 7B

by ByteDance

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Chat with UI-TARS 7B

Capabilities

Vision

Pricing

Input Tokens

Per 1M tokens

Free

Output Tokens

Per 1M tokens

Free

Image Processing

Per 1M tokens

$0.00/1M tokens

Supported Modalities

Input

image

text

Output

text

Specifications

Context Length: 128K tokens
Provider: ByteDance
Released: Jul 22, 2025
Model ID: bytedance/ui-tars-1.5-7b

Ready to try it?

Start chatting with UI-TARS 7B right now. No credit card required.

Start Chatting

More from ByteDance

View all models

UI-TARS 7B

Capabilities

Pricing

Supported Modalities

Input

Output

Specifications

Ready to try it?

More from ByteDance

Compare Models