OpenAI compatibility
Migrate OpenAI-compatible chat clients to LLMBase and understand the supported compatibility surface.
Updated
LLMBase is OpenAI-compatible for chat inference. In practice, that means most
applications using the OpenAI SDK for chat.completions.create() can move to
LLMBase by changing the base URL, API key, and model ID.
Migration checklist
| From | Change |
|---|---|
| OpenAI | Set baseURL to https://api.llmbase.ai/v1, use an llmbase_... inference key, and choose an LLMBase model ID |
| Generic OpenAI-compatible client | Replace its base URL with https://api.llmbase.ai/v1, remove vendor-specific routing fields, and use an llmbase_... inference key |
JavaScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.llmbase.ai/v1",
apiKey: process.env.LLMBASE_API_KEY,
});
const response = await client.chat.completions.create({
model: "deepseek/deepseek-v4-flash",
messages: [{ role: "user", content: "Write a short deployment checklist." }],
});
console.log(response.choices[0]?.message.content);
Python
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmbase.ai/v1",
api_key=os.environ["LLMBASE_API_KEY"],
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[{"role": "user", "content": "Write a short deployment checklist."}],
)
print(response.choices[0].message.content)
Compatible endpoints
| Endpoint | Purpose |
|---|---|
POST /v1/chat/completions | Chat, streaming, tool calling, structured outputs, multimodal input on supported models |
GET /v1/models | OpenAI SDK model discovery |
GET /v1/models?metadata=true | Detailed model metadata with pricing, modalities, and capabilities |
GET /v1/models?filter=agents&metadata=true | Detailed metadata filtered to agent-compatible models |
GET /v1/balance | LLMBase prepaid inference balance |
LLMBase focuses on chat inference for curated open-source models. It does not try to mirror every vendor-specific endpoint. Use the direct API for LLMBase-hosted chat inference with prepaid or subscription-backed inference keys; use the LLMBase Chat app or agent API when you want chat-subscription agent features instead.
Supported request features
The direct inference API supports the portable chat features most production clients need:
| Feature | Request field |
|---|---|
| Streaming | stream: true |
| Function tools | tools, tool_choice |
| JSON mode | response_format: { "type": "json_object" } |
| JSON Schema output | response_format: { "type": "json_schema", ... } |
| Log probabilities | logprobs, top_logprobs |
| Reasoning output | reasoning_effort, extra_body.chat_template_kwargs on reasoning-capable models |
| Prompt caching | prompt_cache_key |
| Sampling controls | temperature, top_p, top_k, min_p, repetition_penalty, penalties, stop, seed |
Advanced features are model-dependent. When your app requires tools,
response_format, reasoning output, or logprobs, inspect supported_features and
supported_parameters from /v1/models?metadata=true before choosing a
model.
Models that advertise reasoning can return reasoning_content in the
OpenAI-compatible assistant message. Use reasoning_effort where supported for
portable effort control. LLMBase also forwards the common template flags
enable_thinking, thinking, and preserve_thinking from
extra_body.chat_template_kwargs for model families that use those names.
Differences from broad gateway APIs
Some OpenAI-compatible gateways expose marketplace routing controls because they span many vendors. LLMBase direct inference uses the LLMBase model registry, so requests should choose one LLMBase model ID instead.
Remove vendor-specific routing fields before sending direct LLMBase requests:
| Field | LLMBase equivalent |
|---|---|
provider | Choose the LLMBase model ID |
models / route | Use one model ID; LLMBase handles eligible failover internally |
plugins | Use LLMBase-supported request fields such as tools or response_format |
debug | Not part of the public direct inference contract |
service_tier | Not part of the public direct inference contract |
top-level cache_control | Use prompt_cache_key |
If your application depends on another gateway’s marketplace catalog or
vendor-specific routing preferences, keep that integration separate. If you need
LLMBase models with direct inference billing, use https://api.llmbase.ai/v1.
Differences from native model APIs
Native model APIs often expose model-family-specific endpoints. LLMBase currently documents the direct chat inference surface, not every native vendor API.
Use LLMBase when you want:
- one curated model ID namespace
- prepaid or subscription-backed inference keys
- OpenAI-compatible chat requests
- prompt-cache billing when responses report cached tokens
- dynamic model metadata with supported capabilities
Use a native model API when your workflow depends on model families outside LLMBase chat inference, such as custom image generation endpoints, native webhooks, private deployment controls, embeddings, reranking, or speech APIs.
Cost controls
For predictable spend:
- Use
GET /v1/balancebefore starting long prepaid or overflow-backed jobs. - Pick a model from
/v1/models?metadata=trueand read itspricing.prompt,pricing.completion, and optionalpricing.input_cache_read. - Set
max_tokensfor user-facing requests. - Use a stable
prompt_cache_keyfor repeated long system prompts, agent conversations, or workspace-level prompts. - Keep large static instructions at the beginning of
messagesand changing user input later, so prompt caches can reuse the prefix.
Common mistakes
| Symptom | Check |
|---|---|
401 invalid_api_key | Use an llmbase_... inference key, not another service’s key, an OpenAI key, or a llmbase_chat_... key |
402 insufficient_quota | Add prepaid inference credits, check the subscription-backed key budget or spend cap, or use the Chat app/agent API with a Pro subscription |
404 Model not found | Use an ID returned by GET /v1/models |
400 invalid_request_error for tools or JSON | Choose a model that advertises the needed supported_features |
| SDK gets a Cloudflare HTML page | Send Authorization: Bearer $LLMBASE_API_KEY on model discovery and completions requests |
| Request works elsewhere but not direct | Remove vendor-specific fields such as provider, models, route, and plugins |