Inference

Reasoning

Use reasoning_effort and read reasoning traces when models support them.

Updated March 16, 2026

Models that advertise reasoning can expose the model’s thinking trace as reasoning_content on the assistant message. LLMBase also includes a compatibility alias named reasoning when returned by the selected model.

Use reasoning_effort for portable effort control when the selected model lists that parameter in supported_parameters.

Effort control

{
  "model": "deepseek/deepseek-v4-pro",
  "messages": [
    { "role": "user", "content": "Compare these two migration plans." }
  ],
  "reasoning_effort": "high"
}

Common effort values are low, medium, and high. Support is model-dependent, so check metadata before requiring the field in production.

Thinking flags

Some model families also expose thinking through chat template flags. LLMBase forwards the safe, portable flags below from extra_body.chat_template_kwargs to supported model routes:

{
  "model": "google/gemma-4-26b-a4b-it",
  "messages": [
    { "role": "user", "content": "Solve 17 * 23 and show the final answer." }
  ],
  "extra_body": {
    "chat_template_kwargs": {
      "enable_thinking": true,
      "thinking": true,
      "preserve_thinking": true
    }
  }
}

Response shape

Typical non-streaming responses include the final answer in content and the reasoning trace separately:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "17 * 23 = 391.",
        "reasoning_content": "Compute 17 * 20 = 340 and 17 * 3 = 51, then add them.",
        "reasoning": "Compute 17 * 20 = 340 and 17 * 3 = 51, then add them."
      }
    }
  ]
}

Reasoning support is model-dependent. If your request depends on reasoning, choose a model whose metadata includes supported_features: ["reasoning"] and supported_parameters containing reasoning_effort.