Models for agents
Understand which LLMBase models are available to OpenAI-compatible agents and which models to start with.
Updated
Only inference models are accessible via agents; browse the current list in inference models.
Agent keys expose the agent-compatible models from the LLMBase inference model
registry. GET https://llmbase.ai/api/v1/agents/models proxies the inference
API’s GET https://api.llmbase.ai/v1/agents/models view, so LLMBase does not
maintain a separate static allowlist for OpenClaw, Hermes, or other agents.
The response uses the OpenAI models.list() shape and includes LLMBase metadata
such as pricing, supported_features, and prompt_cache_supported when
available. If a model is not returned by this endpoint, it cannot be used with a
chat agent key. Chat completions for unavailable agent models return 403 with
model_not_available_for_agents.
The agent model view includes only chat/tool-capable models from LLMBase’s cost-controlled inference catalog. Image, embedding, rerank, classification, and non-agent models are not returned.
How model routing works
Agents always send a stable LLMBase model ID, such as
deepseek/deepseek-v4-pro. They do not configure routing, fallback models, or
Gateway route names.
At request time, LLMBase inspects the OpenAI-compatible request body and chooses the internal route that matches the requested capabilities, such as streaming, tool calls, JSON output, structured outputs, logprobs, or image/file inputs. The request is then sent through LLMBase-managed Cloudflare AI Gateway routing. That internal route can contain several attempts so LLMBase can keep a model available when a route is temporarily unavailable or does not support a requested feature.
Those internal dynamic routes are not public model IDs. They are operational
routing graphs. Public model listing and chat responses continue to use the
LLMBase model IDs returned by GET /api/v1/agents/models, and internal route
details are intentionally not exposed to agent clients.
Use a llmbase_chat_... key when an external agent should consume the user’s
chat subscription. Use a llmbase_... inference API key when your application
needs direct OpenAI-compatible inference billing, prompt-cache pricing, or the
curated inference model list.
Recommended models for agents
Always call GET /api/v1/agents/models for the current model list. The list is
generated dynamically from the inference registry, so available models can
change as LLMBase adds or removes eligible models.
Good starting points:
| Model | Best fit | Why |
|---|---|---|
deepseek/deepseek-v4-flash | Default OpenClaw/Hermes agent model | Strong cost/performance, long context, tool-capable, good for many agent steps |
z-ai/glm-5.1 | Coding and agent orchestration | Strong coding and tool-use behavior for development workflows |
qwen/qwen3-coder | Repository work and structured coding tasks | Coding-focused model family, useful for code generation and tool loops |
deepseek/deepseek-v3.2 | Reasoning-heavy workflows | Good balance when a task needs more reasoning than a fast default model |
deepseek/deepseek-v4-pro | Long-context flagship tasks | Use for difficult tasks where quality matters more than quota efficiency |
openai/gpt-oss-120b | Open-weight reasoning | Useful when you prefer open-weight models for reasoning workloads |
For most users, start with deepseek/deepseek-v4-flash. Move to
z-ai/glm-5.1 or qwen/qwen3-coder for code-heavy workflows. Reserve
deepseek/deepseek-v4-pro for difficult long-context tasks because larger
models consume the included budget faster.