Quickstart
Make your first LLMBase API call in under two minutes.
Updated
LLMBase exposes an OpenAI-compatible chat inference API at
https://api.llmbase.ai. OpenAI SDK chat clients work with LLMBase by changing
the base URL, API key, and model ID.
OpenAI compatibility
The inference API follows the OpenAI chat-completions and models formats for the endpoints below:
POST /v1/chat/completionsGET /v1/modelsAuthorization: Bearer <LLMBASE_API_KEY>- OpenAI SDK
baseURL:https://api.llmbase.ai/v1
LLMBase also exposes LLMBase-specific endpoints for prepaid balance and richer model metadata. It does not implement every vendor-specific endpoint; see Chat completions for supported request parameters.
Migrating an existing OpenAI-compatible chat client? Start with OpenAI compatibility for the exact base URL, field differences, and common error fixes.
Inference API keys use the llmbase_... prefix and can be configured for
prepaid credits or subscription-backed inference budgets. They are separate
from llmbase_chat_... chat-agent keys, which use a Pro chat subscription at
https://llmbase.ai/api/v1/agents.
Base URL
https://api.llmbase.ai
Your first request
curl
curl https://api.llmbase.ai/v1/chat/completions \
-H "Authorization: Bearer $LLMBASE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [
{ "role": "user", "content": "Hello! What can you do?" }
]
}'
Node.js — OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.llmbase.ai/v1",
apiKey: process.env.LLMBASE_API_KEY,
});
const response = await client.chat.completions.create({
model: "deepseek/deepseek-v4-flash",
messages: [{ role: "user", content: "Hello! What can you do?" }],
});
console.log(response.choices[0].message.content);
Python — OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmbase.ai/v1",
api_key=os.environ["LLMBASE_API_KEY"],
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello! What can you do?"}],
)
print(response.choices[0].message.content)
Streaming
Add "stream": true to receive tokens as they are generated using
Server-Sent Events.
const stream = await client.chat.completions.create({
model: "deepseek/deepseek-v4-flash",
messages: [{ role: "user", content: "Write a haiku about inference." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
Cost-aware first setup
Before running a prepaid or overflow-backed batch job or agent loop, check your prepaid balance and choose a model with the capabilities you need:
curl https://api.llmbase.ai/v1/balance \
-H "Authorization: Bearer $LLMBASE_API_KEY"
curl https://api.llmbase.ai/v1/model-metadata \
-H "Authorization: Bearer $LLMBASE_API_KEY"
Use prompt_cache_key for repeated long prompts and set max_tokens on
user-facing requests so spend stays predictable.
Model features are explicit in metadata. Before using tool calls, structured
outputs, multimodal input, logprobs, or reasoning traces, choose a model whose
supported_features and supported_parameters include the fields your request
needs.
Next steps
- Authentication — learn how API keys work
- OpenAI compatibility — migrate from OpenAI-compatible clients
- Models — browse available models
- Chat completions — prompt caching, streaming, tools, and the full parameter reference
- Agent integrations — use a Pro chat subscription from OpenClaw, Hermes, or another OpenAI-compatible agent