Inference

Fallback

How smart fallback works for temporary model availability issues.

Updated March 16, 2026

POST /v1/chat/completions uses smart fallback by default. If the requested model is temporarily unavailable, LLMBase may serve the request with an eligible fallback model instead of returning an availability error.

Fallback is capability-safe and price-aware. Requests with images, tools, or response_format fall back only to models that support the same required modalities and features. When fallback applies, LLMBase bills the cheaper of the requested model and the served fallback model for the actual token usage.

The initial smart fallback rollout covers /v1/chat/completions.

Response headers

Responses include fallback metadata in HTTP headers:

Header	Description
`x-llmbase-requested-model`	Model ID sent in the request
`x-llmbase-served-model`	Model ID that generated the response
`x-llmbase-fallback-applied`	`true` when fallback served the request, otherwise `false`
`x-llmbase-fallback-reason`	Present when fallback applies
`x-llmbase-fallback-chain`	Comma-separated model chain when fallback applies

Disable fallback

To require the requested model and hard-fail instead of falling back, send:

X-LLMBase-Fallback: off

Use this for evaluations, reproducibility checks, and workloads where a model substitution is worse than a retryable error.

Model selection

Fallback does not make unsupported feature combinations valid. Before sending images, tools, structured output, logprobs, or reasoning controls, choose a model that advertises the matching capability in /v1/models?metadata=true.