LLMBase | Docs

Inference

Streaming

Stream chat completion tokens with Server-Sent Events.

Updated


Set "stream": true on POST /v1/chat/completions to receive tokens as they are generated using Server-Sent Events. Each event is a JSON-encoded chat.completion.chunk.

curl

curl https://api.llmbase.ai/v1/chat/completions \
  -H "Authorization: Bearer $LLMBASE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [{ "role": "user", "content": "Count to 5." }],
    "stream": true
  }'

The stream is a sequence of data: {...} lines, terminated by data: [DONE]:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1741000000,"model":"deepseek/deepseek-v4-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1741000000,"model":"deepseek/deepseek-v4-flash","choices":[{"index":0,"delta":{"content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1741000000,"model":"deepseek/deepseek-v4-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":9,"total_tokens":19}}

data: [DONE]

Token usage is included in the final chunk of every stream.

OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmbase.ai/v1",
  apiKey: process.env.LLMBASE_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a short poem." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Streaming uses the same request body as non-streaming chat completions. Combine stream: true with tools, response_format, prompt_cache_key, and other supported parameters when the selected model advertises the required capabilities in /v1/models?metadata=true.