LLMBase | Docs

OpenAI compatibility

Migrate OpenAI-compatible chat clients to LLMBase and understand the supported compatibility surface.

Updated


LLMBase is OpenAI-compatible for chat inference. In practice, that means most applications using the OpenAI SDK for chat.completions.create() can move to LLMBase by changing the base URL, API key, and model ID.

Migration checklist

FromChange
OpenAISet baseURL to https://api.llmbase.ai/v1, use an llmbase_... inference key, and choose an LLMBase model ID
Generic OpenAI-compatible clientReplace its base URL with https://api.llmbase.ai/v1, remove vendor-specific routing fields, and use an llmbase_... inference key

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmbase.ai/v1",
  apiKey: process.env.LLMBASE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a short deployment checklist." }],
});

console.log(response.choices[0]?.message.content);

Python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmbase.ai/v1",
    api_key=os.environ["LLMBASE_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Write a short deployment checklist."}],
)

print(response.choices[0].message.content)

Compatible endpoints

EndpointPurpose
POST /v1/chat/completionsChat, streaming, tool calling, structured outputs, multimodal input on supported models
GET /v1/modelsOpenAI SDK model discovery
GET /v1/models?metadata=trueDetailed model metadata with pricing, modalities, and capabilities
GET /v1/models?filter=agents&metadata=trueDetailed metadata filtered to agent-compatible models
GET /v1/balanceLLMBase prepaid inference balance

LLMBase focuses on chat inference for curated open-source models. It does not try to mirror every vendor-specific endpoint. Use the direct API for LLMBase-hosted chat inference with prepaid or subscription-backed inference keys; use the LLMBase Chat app or agent API when you want chat-subscription agent features instead.

Supported request features

The direct inference API supports the portable chat features most production clients need:

FeatureRequest field
Streamingstream: true
Function toolstools, tool_choice
JSON moderesponse_format: { "type": "json_object" }
JSON Schema outputresponse_format: { "type": "json_schema", ... }
Log probabilitieslogprobs, top_logprobs
Reasoning outputreasoning_effort, extra_body.chat_template_kwargs on reasoning-capable models
Prompt cachingprompt_cache_key
Sampling controlstemperature, top_p, top_k, min_p, repetition_penalty, penalties, stop, seed

Advanced features are model-dependent. When your app requires tools, response_format, reasoning output, or logprobs, inspect supported_features and supported_parameters from /v1/models?metadata=true before choosing a model.

Models that advertise reasoning can return reasoning_content in the OpenAI-compatible assistant message. Use reasoning_effort where supported for portable effort control. LLMBase also forwards the common template flags enable_thinking, thinking, and preserve_thinking from extra_body.chat_template_kwargs for model families that use those names.

Differences from broad gateway APIs

Some OpenAI-compatible gateways expose marketplace routing controls because they span many vendors. LLMBase direct inference uses the LLMBase model registry, so requests should choose one LLMBase model ID instead.

Remove vendor-specific routing fields before sending direct LLMBase requests:

FieldLLMBase equivalent
providerChoose the LLMBase model ID
models / routeUse one model ID; LLMBase handles eligible failover internally
pluginsUse LLMBase-supported request fields such as tools or response_format
debugNot part of the public direct inference contract
service_tierNot part of the public direct inference contract
top-level cache_controlUse prompt_cache_key

If your application depends on another gateway’s marketplace catalog or vendor-specific routing preferences, keep that integration separate. If you need LLMBase models with direct inference billing, use https://api.llmbase.ai/v1.

Differences from native model APIs

Native model APIs often expose model-family-specific endpoints. LLMBase currently documents the direct chat inference surface, not every native vendor API.

Use LLMBase when you want:

  • one curated model ID namespace
  • prepaid or subscription-backed inference keys
  • OpenAI-compatible chat requests
  • prompt-cache billing when responses report cached tokens
  • dynamic model metadata with supported capabilities

Use a native model API when your workflow depends on model families outside LLMBase chat inference, such as custom image generation endpoints, native webhooks, private deployment controls, embeddings, reranking, or speech APIs.

Cost controls

For predictable spend:

  1. Use GET /v1/balance before starting long prepaid or overflow-backed jobs.
  2. Pick a model from /v1/models?metadata=true and read its pricing.prompt, pricing.completion, and optional pricing.input_cache_read.
  3. Set max_tokens for user-facing requests.
  4. Use a stable prompt_cache_key for repeated long system prompts, agent conversations, or workspace-level prompts.
  5. Keep large static instructions at the beginning of messages and changing user input later, so prompt caches can reuse the prefix.

Common mistakes

SymptomCheck
401 invalid_api_keyUse an llmbase_... inference key, not another service’s key, an OpenAI key, or a llmbase_chat_... key
402 insufficient_quotaAdd prepaid inference credits, check the subscription-backed key budget or spend cap, or use the Chat app/agent API with a Pro subscription
404 Model not foundUse an ID returned by GET /v1/models
400 invalid_request_error for tools or JSONChoose a model that advertises the needed supported_features
SDK gets a Cloudflare HTML pageSend Authorization: Bearer $LLMBASE_API_KEY on model discovery and completions requests
Request works elsewhere but not directRemove vendor-specific fields such as provider, models, route, and plugins