LLMBase | Docs

Inference

Fallback

How smart fallback works for temporary model availability issues.

Updated


POST /v1/chat/completions uses smart fallback by default. If the requested model is temporarily unavailable, LLMBase may serve the request with an eligible fallback model instead of returning an availability error.

Fallback is capability-safe and price-aware. Requests with images, tools, or response_format fall back only to models that support the same required modalities and features. When fallback applies, LLMBase bills the cheaper of the requested model and the served fallback model for the actual token usage.

The initial smart fallback rollout covers /v1/chat/completions.

Response headers

Responses include fallback metadata in HTTP headers:

HeaderDescription
x-llmbase-requested-modelModel ID sent in the request
x-llmbase-served-modelModel ID that generated the response
x-llmbase-fallback-appliedtrue when fallback served the request, otherwise false
x-llmbase-fallback-reasonPresent when fallback applies
x-llmbase-fallback-chainComma-separated model chain when fallback applies

Disable fallback

To require the requested model and hard-fail instead of falling back, send:

X-LLMBase-Fallback: off

Use this for evaluations, reproducibility checks, and workloads where a model substitution is worse than a retryable error.

Model selection

Fallback does not make unsupported feature combinations valid. Before sending images, tools, structured output, logprobs, or reasoning controls, choose a model that advertises the matching capability in /v1/models?metadata=true.