Inference
Fallback
How smart fallback works for temporary model availability issues.
Updated
POST /v1/chat/completions uses smart fallback by default. If the requested
model is temporarily unavailable, LLMBase may serve the request with an eligible
fallback model instead of returning an availability error.
Fallback is capability-safe and price-aware. Requests with images, tools, or
response_format fall back only to models that support the same required
modalities and features. When fallback applies, LLMBase bills the cheaper of
the requested model and the served fallback model for the actual token usage.
The initial smart fallback rollout covers /v1/chat/completions.
Response headers
Responses include fallback metadata in HTTP headers:
| Header | Description |
|---|---|
x-llmbase-requested-model | Model ID sent in the request |
x-llmbase-served-model | Model ID that generated the response |
x-llmbase-fallback-applied | true when fallback served the request, otherwise false |
x-llmbase-fallback-reason | Present when fallback applies |
x-llmbase-fallback-chain | Comma-separated model chain when fallback applies |
Disable fallback
To require the requested model and hard-fail instead of falling back, send:
X-LLMBase-Fallback: off
Use this for evaluations, reproducibility checks, and workloads where a model substitution is worse than a retryable error.
Model selection
Fallback does not make unsupported feature combinations valid. Before sending
images, tools, structured output, logprobs, or reasoning controls, choose a
model that advertises the matching capability in
/v1/models?metadata=true.