Inference
Reasoning
Use reasoning_effort and read reasoning traces when models support them.
Updated
Models that advertise reasoning can expose the model’s thinking trace as
reasoning_content on the assistant message. LLMBase also includes a
compatibility alias named reasoning when returned by the selected model.
Use reasoning_effort for portable effort control when the selected model lists
that parameter in supported_parameters.
Effort control
{
"model": "deepseek/deepseek-v4-pro",
"messages": [
{ "role": "user", "content": "Compare these two migration plans." }
],
"reasoning_effort": "high"
}
Common effort values are low, medium, and high. Support is
model-dependent, so check metadata before requiring the field in production.
Thinking flags
Some model families also expose thinking through chat template flags. LLMBase
forwards the safe, portable flags below from extra_body.chat_template_kwargs
to supported model routes:
{
"model": "google/gemma-4-26b-a4b-it",
"messages": [
{ "role": "user", "content": "Solve 17 * 23 and show the final answer." }
],
"extra_body": {
"chat_template_kwargs": {
"enable_thinking": true,
"thinking": true,
"preserve_thinking": true
}
}
}
Response shape
Typical non-streaming responses include the final answer in content and the
reasoning trace separately:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "17 * 23 = 391.",
"reasoning_content": "Compute 17 * 20 = 340 and 17 * 3 = 51, then add them.",
"reasoning": "Compute 17 * 20 = 340 and 17 * 3 = 51, then add them."
}
}
]
}
Reasoning support is model-dependent. If your request depends on reasoning,
choose a model whose metadata includes supported_features: ["reasoning"] and
supported_parameters containing reasoning_effort.