Inference

Model discovery

List models, inspect metadata, and filter the LLMBase Inference API catalog programmatically.

Updated June 15, 2026

LLMBase routes your request automatically through the managed inference network. You reference models by their unified ID and do not need to configure routing yourself.

The inference model list is intentionally curated for direct API usage. The agent model list is derived from this same registry, but it only returns the chat/tool-capable models that are safe for subscription-backed agents. Use the direct inference API when you need OpenAI-compatible inference billing, prompt-cache pricing, and predictable API costs. Use Agent integrations when you want an OpenAI-compatible agent to consume a Pro chat subscription.

Choosing a model

For production systems, choose by capability first and price second:

Workload	What to inspect
Agent loop	`supported_features` includes `tools`; check cache-read pricing
JSON extraction	`supported_features` includes `json_mode` or `structured_outputs`
Reasoning traces	`supported_features` includes `reasoning`; `supported_parameters` includes `reasoning_effort`
Ranking or confidence	`supported_features` includes `logprobs`
Vision or OCR	`input_modalities` includes `image` or `file`
Long documents	`context_length`, `max_output_length`, and token price

If your client sends a capability the selected model does not advertise, LLMBase returns a 400 error instead of running an incompatible request.

Capability-specific guides:

Tools for function calling models
Structured outputs for JSON mode and JSON Schema
Prompt caching for cache-read pricing
Reasoning for thinking traces and effort controls
Fallback for temporary availability handling

List models

GET /v1/models returns all available models in the OpenAI models format. The Worker does not require account authorization for this endpoint, but send your normal Bearer header anyway. The header lets Cloudflare recognize the request as API traffic and skip browser-style challenge handling.

curl https://api.llmbase.ai/v1/models \
  -H "Authorization: Bearer $LLMBASE_API_KEY"

Response

{
  "object": "list",
  "data": [
    {
      "id": "provider/model-id",
      "object": "model",
      "created": 1700000000,
      "owned_by": "provider",
      "name": "Provider: Model",
      "description": "Model description.",
      "context_length": 1048576
    }
  ]
}

With the OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmbase.ai/v1",
  apiKey: process.env.LLMBASE_API_KEY,
});

const models = await client.models.list();
for (const model of models.data) {
  console.log(model.id);
}

Agent-compatible models

Only inference models are accessible via agents; browse the current list on the Models page.

OpenAI-compatible agents that use a llmbase_chat_... key should list models from the agent API, not from the direct inference API:

curl https://llmbase.ai/api/v1/agents/models \
  -H "Authorization: Bearer $LLMBASE_CHAT_AGENT_KEY"

This endpoint is backed by the same model registry as GET /v1/models, but it returns only chat/tool-capable models available for subscription-backed agents. The list is generated dynamically from the inference registry, so agents inherit new eligible chat/tool-capable models without LLMBase maintaining a separate static OpenClaw or Hermes allowlist.

Models that are not returned by /api/v1/agents/models cannot be used with llmbase_chat_... keys. Use a llmbase_... key and https://api.llmbase.ai/v1 when you need direct inference billing or a model outside the agent-compatible list.

Model metadata and capabilities

Use the OpenAI-compatible GET /v1/models endpoint for SDK model discovery. Use the model metadata endpoint when your application needs richer production metadata such as context length, maximum output length, pricing, modalities, supported sampling parameters, and certified features:

curl "https://api.llmbase.ai/v1/models?metadata=true" \
  -H "Authorization: Bearer $LLMBASE_API_KEY"

GET /v1/model-metadata is kept as a compatibility alias for the same rich metadata format. New clients should prefer /v1/models?metadata=true so model discovery and metadata use one endpoint family.

Each entry includes:

Field	Description
`id`	Stable LLMBase model ID used in API requests
`context_length`	Maximum input + output context window
`max_output_length`	Maximum generated tokens for one response
`input_modalities` / `output_modalities`	Supported text/image input and output modes
`pricing.prompt` / `pricing.completion`	USD per input or output token
`pricing.input_cache_read`	Cached-input token price when prompt-cache reads are supported
`supported_parameters`	Parameters such as `temperature`, `top_p`, `max_tokens`, `logprobs`, `top_logprobs`, or `reasoning_effort`
`supported_features`	Higher-level features such as `tools`, `json_mode`, `structured_outputs`, `reasoning`, or `logprobs`
`lifecycle`	Optional deprecation and scheduled removal metadata, including `removal_date` and `replacement_ids`

Before sending advanced options like response_format, tools, logprobs, or top_logprobs, choose a model that advertises the matching capability. If a request asks for a feature that the selected model does not support, LLMBase returns an OpenAI-style 400 error instead of running an incompatible request.

Filter models programmatically

const res = await fetch("https://api.llmbase.ai/v1/models?metadata=true", {
  headers: { Authorization: `Bearer ${process.env.LLMBASE_API_KEY}` },
});

const { data } = await res.json();
const toolModels = data.filter((model) =>
  model.supported_features?.includes("tools") &&
  model.pricing?.input_cache_read
);

console.log(toolModels.map((model) => model.id));

For agent-capable models only, add the agents filter:

curl "https://api.llmbase.ai/v1/models?filter=agents&metadata=true" \
  -H "Authorization: Bearer $LLMBASE_API_KEY"

This is useful for agents and SaaS products where model lists should update automatically as LLMBase adds new eligible models.

Prompt-cache pricing

Some inference models support prompt-cache reads. When a response reports cached prompt tokens in usage.prompt_tokens_details.cached_tokens, LLMBase bills those cached input tokens at that model’s cache-read price instead of the normal input-token price.

Models with cache-read pricing include pricing.input_cache_read in the response and show a cache-read value in the live Models table. See Prompt caching for request examples, prompt layout advice, and the cost formula.

Unsupported model families

The direct inference API is intentionally curated for chat and multimodal inference. Model families such as embeddings, rerankers, native image generation, audio generation, and native classifiers are not exposed through POST /v1/chat/completions unless they are represented as a supported chat model in /v1/models.

For those workloads, use the matching LLMBase product surface when available or a native API designed for that model family.