LLMBase | Docs

Quickstart

Make your first LLMBase API call in under two minutes.

Updated


LLMBase exposes an OpenAI-compatible chat inference API at https://api.llmbase.ai. OpenAI SDK chat clients work with LLMBase by changing the base URL, API key, and model ID.

OpenAI compatibility

The inference API follows the OpenAI chat-completions and models formats for the endpoints below:

  • POST /v1/chat/completions
  • GET /v1/models
  • Authorization: Bearer <LLMBASE_API_KEY>
  • OpenAI SDK baseURL: https://api.llmbase.ai/v1

LLMBase also exposes LLMBase-specific endpoints for prepaid balance and richer model metadata. It does not implement every vendor-specific endpoint; see Chat completions for supported request parameters.

Migrating an existing OpenAI-compatible chat client? Start with OpenAI compatibility for the exact base URL, field differences, and common error fixes.

Inference API keys use the llmbase_... prefix and can be configured for prepaid credits or subscription-backed inference budgets. They are separate from llmbase_chat_... chat-agent keys, which use a Pro chat subscription at https://llmbase.ai/api/v1/agents.

Base URL

https://api.llmbase.ai

Your first request

curl

curl https://api.llmbase.ai/v1/chat/completions \
  -H "Authorization: Bearer $LLMBASE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      { "role": "user", "content": "Hello! What can you do?" }
    ]
  }'

Node.js — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmbase.ai/v1",
  apiKey: process.env.LLMBASE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Hello! What can you do?" }],
});

console.log(response.choices[0].message.content);

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmbase.ai/v1",
    api_key=os.environ["LLMBASE_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello! What can you do?"}],
)

print(response.choices[0].message.content)

Streaming

Add "stream": true to receive tokens as they are generated using Server-Sent Events.

const stream = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a haiku about inference." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Cost-aware first setup

Before running a prepaid or overflow-backed batch job or agent loop, check your prepaid balance and choose a model with the capabilities you need:

curl https://api.llmbase.ai/v1/balance \
  -H "Authorization: Bearer $LLMBASE_API_KEY"

curl https://api.llmbase.ai/v1/model-metadata \
  -H "Authorization: Bearer $LLMBASE_API_KEY"

Use prompt_cache_key for repeated long prompts and set max_tokens on user-facing requests so spend stays predictable.

Model features are explicit in metadata. Before using tool calls, structured outputs, multimodal input, logprobs, or reasoning traces, choose a model whose supported_features and supported_parameters include the fields your request needs.

Next steps