OpenAI compatibility

Migrate OpenAI-compatible chat clients to LLMBase and understand the supported compatibility surface.

Updated March 16, 2026

LLMBase is OpenAI-compatible for chat inference. In practice, that means most applications using the OpenAI SDK for chat.completions.create() can move to LLMBase by changing the base URL, API key, and model ID.

Migration checklist

From	Change
OpenAI	Set `baseURL` to `https://api.llmbase.ai/v1`, use an `llmbase_...` inference key, and choose an LLMBase model ID
Generic OpenAI-compatible client	Replace its base URL with `https://api.llmbase.ai/v1`, remove vendor-specific routing fields, and use an `llmbase_...` inference key

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.llmbase.ai/v1",
  apiKey: process.env.LLMBASE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Write a short deployment checklist." }],
});

console.log(response.choices[0]?.message.content);

Python

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.llmbase.ai/v1",
    api_key=os.environ["LLMBASE_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Write a short deployment checklist."}],
)

print(response.choices[0].message.content)

Compatible endpoints

Endpoint	Purpose
`POST /v1/chat/completions`	Chat, streaming, tool calling, structured outputs, multimodal input on supported models
`GET /v1/models`	OpenAI SDK model discovery
`GET /v1/models?metadata=true`	Detailed model metadata with pricing, modalities, and capabilities
`GET /v1/models?filter=agents&metadata=true`	Detailed metadata filtered to agent-compatible models
`GET /v1/balance`	LLMBase prepaid inference balance

LLMBase focuses on chat inference for curated open-source models. It does not try to mirror every vendor-specific endpoint. Use the direct API for LLMBase-hosted chat inference with prepaid or subscription-backed inference keys; use the LLMBase Chat app or agent API when you want chat-subscription agent features instead.

Supported request features

The direct inference API supports the portable chat features most production clients need:

Feature	Request field
Streaming	`stream: true`
Function tools	`tools`, `tool_choice`
JSON mode	`response_format: { "type": "json_object" }`
JSON Schema output	`response_format: { "type": "json_schema", ... }`
Log probabilities	`logprobs`, `top_logprobs`
Reasoning output	`reasoning_effort`, `extra_body.chat_template_kwargs` on reasoning-capable models
Prompt caching	`prompt_cache_key`
Sampling controls	`temperature`, `top_p`, `top_k`, `min_p`, `repetition_penalty`, penalties, `stop`, `seed`

Advanced features are model-dependent. When your app requires tools, response_format, reasoning output, or logprobs, inspect supported_features and supported_parameters from /v1/models?metadata=true before choosing a model.

Models that advertise reasoning can return reasoning_content in the OpenAI-compatible assistant message. Use reasoning_effort where supported for portable effort control. LLMBase also forwards the common template flags enable_thinking, thinking, and preserve_thinking from extra_body.chat_template_kwargs for model families that use those names.

Differences from broad gateway APIs

Some OpenAI-compatible gateways expose marketplace routing controls because they span many vendors. LLMBase direct inference uses the LLMBase model registry, so requests should choose one LLMBase model ID instead.

Remove vendor-specific routing fields before sending direct LLMBase requests:

Field	LLMBase equivalent
`provider`	Choose the LLMBase model ID
`models` / `route`	Use one model ID; LLMBase handles eligible failover internally
`plugins`	Use LLMBase-supported request fields such as `tools` or `response_format`
`debug`	Not part of the public direct inference contract
`service_tier`	Not part of the public direct inference contract
top-level `cache_control`	Use `prompt_cache_key`

If your application depends on another gateway’s marketplace catalog or vendor-specific routing preferences, keep that integration separate. If you need LLMBase models with direct inference billing, use https://api.llmbase.ai/v1.

Differences from native model APIs

Native model APIs often expose model-family-specific endpoints. LLMBase currently documents the direct chat inference surface, not every native vendor API.

Use LLMBase when you want:

one curated model ID namespace
prepaid or subscription-backed inference keys
OpenAI-compatible chat requests
prompt-cache billing when responses report cached tokens
dynamic model metadata with supported capabilities

Use a native model API when your workflow depends on model families outside LLMBase chat inference, such as custom image generation endpoints, native webhooks, private deployment controls, embeddings, reranking, or speech APIs.

Cost controls

For predictable spend:

Use GET /v1/balance before starting long prepaid or overflow-backed jobs.
Pick a model from /v1/models?metadata=true and read its pricing.prompt, pricing.completion, and optional pricing.input_cache_read.
Set max_tokens for user-facing requests.
Use a stable prompt_cache_key for repeated long system prompts, agent conversations, or workspace-level prompts.
Keep large static instructions at the beginning of messages and changing user input later, so prompt caches can reuse the prefix.

Common mistakes

Symptom	Check
`401 invalid_api_key`	Use an `llmbase_...` inference key, not another service’s key, an OpenAI key, or a `llmbase_chat_...` key
`402 insufficient_quota`	Add prepaid inference credits, check the subscription-backed key budget or spend cap, or use the Chat app/agent API with a Pro subscription
`404 Model not found`	Use an ID returned by `GET /v1/models`
`400 invalid_request_error` for tools or JSON	Choose a model that advertises the needed `supported_features`
SDK gets a Cloudflare HTML page	Send `Authorization: Bearer $LLMBASE_API_KEY` on model discovery and completions requests
Request works elsewhere but not direct	Remove vendor-specific fields such as `provider`, `models`, `route`, and `plugins`