Custom LLM lets you connect Rapida to a self-hosted or third-party chat endpoint. The UI stores provider credentials and model options;Documentation Index
Fetch the complete documentation index at: https://doc.rapida.ai/llms.txt
Use this file to discover all available pages before exploring further.
integration-api resolves the custom-llm provider and routes requests through the selected API compatibility.
Provider identifier: custom-llm
Where it is configured
Create the provider credential
In the Rapida dashboard, open Integrations > Models, choose Custom LLM, and create a credential.
The backend also accepts snake case keys:
| Credential field | Required | Value |
|---|---|---|
apiCompatibility | Yes | API shape to call. See compatibility values below. |
baseUrl | Yes | Base URL for the model server. |
headers | No | Header map sent with every request, for example {"Authorization":"Bearer sk_..."}. |
api_compatibility and base_url.Choose the custom model in the assistant
Open the assistant model settings, choose Custom LLM, and enter the model ID. The UI stores both
model.id and model.name for custom model values.Compatibility values
apiCompatibility value | Endpoint shape | Runtime status |
|---|---|---|
openai_chat_completions | OpenAI Chat Completions compatible, usually /v1/chat/completions | Supported |
openai_compatible | OpenAI-compatible chat servers such as vLLM, Ollama, LM Studio, or TGI | Supported |
openai_responses | OpenAI Responses compatible, usually /v1/responses | Supported |
anthropic_messages | Anthropic Messages, usually /v1/messages | Credential option exists, chat execution is not implemented yet |
gemini_generate_content | Gemini generateContent | Credential option exists, chat execution is not implemented yet |
apiCompatibility is omitted, integration-api defaults to openai_chat_completions.
Credential arguments
| Field | Required | Type | Description |
|---|---|---|---|
apiCompatibility | Yes | string | Selects the request/stream implementation. Use one of the compatibility values above. |
baseUrl | Yes | string | Base URL passed to the HTTP client. For OpenAI-compatible servers, include the API root expected by the server, for example http://localhost:8000/v1. |
headers | No | object of string values | Request headers. Use this for Authorization, provider routing headers, tenant IDs, or local server keys. |
Model arguments
| Option key | Required | Description |
|---|---|---|
model.id | Yes | UI model selector value. For custom models, this can be any non-empty model/deployment ID. |
model.name | Yes | Model ID sent to the provider. Usually the same value as model.id. |
model.parameters | No | JSON object containing extra provider options. Merged into the request after direct model.* keys. |
model.* metadata keys. For example, model.temperature becomes temperature in the provider request.
Supported request parameters
Foropenai_chat_completions and openai_compatible, these keys are mapped to first-class Chat Completions fields when possible:
| Parameter | Type | Description |
|---|---|---|
model.name | string | Model name sent as model. |
temperature or model.temperature | number | Sampling temperature. |
top_p or model.top_p | number | Nucleus sampling value. |
max_tokens or model.max_tokens | number | Maximum output tokens. |
max_completion_tokens or model.max_completion_tokens | number | Maximum completion tokens. |
frequency_penalty or model.frequency_penalty | number | Frequency penalty. |
presence_penalty or model.presence_penalty | number | Presence penalty. |
seed or model.seed | number | Deterministic sampling seed. |
stop or model.stop | string[] | Stop sequences. |
tool_choice or model.tool_choice | string | auto, required, or none. Applied only when tools are present. |
response_format or model.response_format | object | {"type":"json_object"}, {"type":"text"}, or {"type":"json_schema", ...}. |
reasoning_effort or model.reasoning_effort | string | Reasoning effort for compatible models. |
user or model.user | string | End-user identifier. |
metadata or model.metadata | object of string values | Provider metadata. |
top_logprobs or model.top_logprobs | number | Number of top log probabilities. |
model.parameters for provider-specific fields such as top_k or chat_template_kwargs.
For openai_responses, these keys are mapped when possible:
| Parameter | Type | Description |
|---|---|---|
model.name | string | Model name sent as model. |
temperature / model.temperature | number | Sampling temperature. |
top_p / model.top_p | number | Nucleus sampling value. |
max_output_tokens, max_completion_tokens, or max_tokens | number | Maximum output tokens. |
store / model.store | boolean | Whether the provider should store the response. Defaults to false. |
tool_choice / model.tool_choice | string | auto, required, or none. Applied only when tools are present. |
response_format / model.response_format | object | Maps to Responses text format. |
reasoning_effort / model.reasoning_effort | string | Reasoning effort. |
service_tier / model.service_tier | string | Provider service tier. |
user / model.user | string | End-user identifier. |
metadata / model.metadata | object of string values | Provider metadata. |
top_logprobs / model.top_logprobs | number | Number of top log probabilities. |
Examples
OpenAI-compatible server
Credential:OpenAI Responses compatible server
Credential:Backend mapping
integration-api resolves custom-llm in api/integration-api/internal/caller/caller.go. Chat and streaming requests route through api/integration-api/internal/caller/custom_llm.
The credential parser validates:
baseUrlis present and non-empty.apiCompatibilityis a non-empty string when provided.headersis a string map when provided.
LLM overview
Caller interfaces and model options.
Custom STT
Configure custom speech-to-text.