LLMBase | Docs

Models for agents

Understand which LLMBase models are available to OpenAI-compatible agents and which models to start with.

Updated


Only inference models are accessible via agents; browse the current list in inference models.

Agent keys expose the agent-compatible models from the LLMBase inference model registry. GET https://llmbase.ai/api/v1/agents/models proxies the inference API’s GET https://api.llmbase.ai/v1/agents/models view, so LLMBase does not maintain a separate static allowlist for OpenClaw, Hermes, or other agents.

The response uses the OpenAI models.list() shape and includes LLMBase metadata such as pricing, supported_features, and prompt_cache_supported when available. If a model is not returned by this endpoint, it cannot be used with a chat agent key. Chat completions for unavailable agent models return 403 with model_not_available_for_agents.

The agent model view includes only chat/tool-capable models from LLMBase’s cost-controlled inference catalog. Image, embedding, rerank, classification, and non-agent models are not returned.

How model routing works

Agents always send a stable LLMBase model ID, such as deepseek/deepseek-v4-pro. They do not configure routing, fallback models, or Gateway route names.

At request time, LLMBase inspects the OpenAI-compatible request body and chooses the internal route that matches the requested capabilities, such as streaming, tool calls, JSON output, structured outputs, logprobs, or image/file inputs. The request is then sent through LLMBase-managed Cloudflare AI Gateway routing. That internal route can contain several attempts so LLMBase can keep a model available when a route is temporarily unavailable or does not support a requested feature.

Those internal dynamic routes are not public model IDs. They are operational routing graphs. Public model listing and chat responses continue to use the LLMBase model IDs returned by GET /api/v1/agents/models, and internal route details are intentionally not exposed to agent clients.

Use a llmbase_chat_... key when an external agent should consume the user’s chat subscription. Use a llmbase_... inference API key when your application needs direct OpenAI-compatible inference billing, prompt-cache pricing, or the curated inference model list.

Always call GET /api/v1/agents/models for the current model list. The list is generated dynamically from the inference registry, so available models can change as LLMBase adds or removes eligible models.

Good starting points:

ModelBest fitWhy
deepseek/deepseek-v4-flashDefault OpenClaw/Hermes agent modelStrong cost/performance, long context, tool-capable, good for many agent steps
z-ai/glm-5.1Coding and agent orchestrationStrong coding and tool-use behavior for development workflows
qwen/qwen3-coderRepository work and structured coding tasksCoding-focused model family, useful for code generation and tool loops
deepseek/deepseek-v3.2Reasoning-heavy workflowsGood balance when a task needs more reasoning than a fast default model
deepseek/deepseek-v4-proLong-context flagship tasksUse for difficult tasks where quality matters more than quota efficiency
openai/gpt-oss-120bOpen-weight reasoningUseful when you prefer open-weight models for reasoning workloads

For most users, start with deepseek/deepseek-v4-flash. Move to z-ai/glm-5.1 or qwen/qwen3-coder for code-heavy workflows. Reserve deepseek/deepseek-v4-pro for difficult long-context tasks because larger models consume the included budget faster.