Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.rapida.ai/llms.txt

Use this file to discover all available pages before exploring further.

Custom LLM lets you connect Rapida to a self-hosted or third-party chat endpoint. The UI stores provider credentials and model options; integration-api resolves the custom-llm provider and routes requests through the selected API compatibility. Provider identifier: custom-llm

Where it is configured

1

Create the provider credential

In the Rapida dashboard, open Integrations > Models, choose Custom LLM, and create a credential.
Credential fieldRequiredValue
apiCompatibilityYesAPI shape to call. See compatibility values below.
baseUrlYesBase URL for the model server.
headersNoHeader map sent with every request, for example {"Authorization":"Bearer sk_..."}.
The backend also accepts snake case keys: api_compatibility and base_url.
2

Choose the custom model in the assistant

Open the assistant model settings, choose Custom LLM, and enter the model ID. The UI stores both model.id and model.name for custom model values.
3

Add model parameters

Use Model Parameters for provider-specific JSON. These parameters are merged into the request after standard model.* keys.

Compatibility values

apiCompatibility valueEndpoint shapeRuntime status
openai_chat_completionsOpenAI Chat Completions compatible, usually /v1/chat/completionsSupported
openai_compatibleOpenAI-compatible chat servers such as vLLM, Ollama, LM Studio, or TGISupported
openai_responsesOpenAI Responses compatible, usually /v1/responsesSupported
anthropic_messagesAnthropic Messages, usually /v1/messagesCredential option exists, chat execution is not implemented yet
gemini_generate_contentGemini generateContentCredential option exists, chat execution is not implemented yet
If apiCompatibility is omitted, integration-api defaults to openai_chat_completions.

Credential arguments

FieldRequiredTypeDescription
apiCompatibilityYesstringSelects the request/stream implementation. Use one of the compatibility values above.
baseUrlYesstringBase URL passed to the HTTP client. For OpenAI-compatible servers, include the API root expected by the server, for example http://localhost:8000/v1.
headersNoobject of string valuesRequest headers. Use this for Authorization, provider routing headers, tenant IDs, or local server keys.
Example credential value:
{
  "apiCompatibility": "openai_chat_completions",
  "baseUrl": "https://llm.example.com/v1",
  "headers": {
    "Authorization": "Bearer sk_example"
  }
}

Model arguments

Option keyRequiredDescription
model.idYesUI model selector value. For custom models, this can be any non-empty model/deployment ID.
model.nameYesModel ID sent to the provider. Usually the same value as model.id.
model.parametersNoJSON object containing extra provider options. Merged into the request after direct model.* keys.
Example model parameters:
{
  "temperature": 0.7,
  "max_tokens": 512,
  "top_p": 0.9
}
You can also pass direct model.* metadata keys. For example, model.temperature becomes temperature in the provider request.

Supported request parameters

For openai_chat_completions and openai_compatible, these keys are mapped to first-class Chat Completions fields when possible:
ParameterTypeDescription
model.namestringModel name sent as model.
temperature or model.temperaturenumberSampling temperature.
top_p or model.top_pnumberNucleus sampling value.
max_tokens or model.max_tokensnumberMaximum output tokens.
max_completion_tokens or model.max_completion_tokensnumberMaximum completion tokens.
frequency_penalty or model.frequency_penaltynumberFrequency penalty.
presence_penalty or model.presence_penaltynumberPresence penalty.
seed or model.seednumberDeterministic sampling seed.
stop or model.stopstring[]Stop sequences.
tool_choice or model.tool_choicestringauto, required, or none. Applied only when tools are present.
response_format or model.response_formatobject{"type":"json_object"}, {"type":"text"}, or {"type":"json_schema", ...}.
reasoning_effort or model.reasoning_effortstringReasoning effort for compatible models.
user or model.userstringEnd-user identifier.
metadata or model.metadataobject of string valuesProvider metadata.
top_logprobs or model.top_logprobsnumberNumber of top log probabilities.
Unknown parameters are passed as extra request fields. Use model.parameters for provider-specific fields such as top_k or chat_template_kwargs. For openai_responses, these keys are mapped when possible:
ParameterTypeDescription
model.namestringModel name sent as model.
temperature / model.temperaturenumberSampling temperature.
top_p / model.top_pnumberNucleus sampling value.
max_output_tokens, max_completion_tokens, or max_tokensnumberMaximum output tokens.
store / model.storebooleanWhether the provider should store the response. Defaults to false.
tool_choice / model.tool_choicestringauto, required, or none. Applied only when tools are present.
response_format / model.response_formatobjectMaps to Responses text format.
reasoning_effort / model.reasoning_effortstringReasoning effort.
service_tier / model.service_tierstringProvider service tier.
user / model.userstringEnd-user identifier.
metadata / model.metadataobject of string valuesProvider metadata.
top_logprobs / model.top_logprobsnumberNumber of top log probabilities.

Examples

OpenAI-compatible server

Credential:
{
  "apiCompatibility": "openai_compatible",
  "baseUrl": "http://localhost:8000/v1",
  "headers": {
    "Authorization": "Bearer local-dev-key"
  }
}
Assistant model:
{
  "model.id": "meta-llama/Llama-3.1-8B-Instruct",
  "model.name": "meta-llama/Llama-3.1-8B-Instruct",
  "model.parameters": {
    "temperature": 0.4,
    "max_tokens": 400,
    "top_k": 40
  }
}

OpenAI Responses compatible server

Credential:
{
  "apiCompatibility": "openai_responses",
  "baseUrl": "https://api.openai.com/v1",
  "headers": {
    "Authorization": "Bearer sk_example"
  }
}
Assistant model:
{
  "model.id": "gpt-4.1-mini",
  "model.name": "gpt-4.1-mini",
  "model.parameters": {
    "max_output_tokens": 512,
    "reasoning_effort": "low",
    "store": false
  }
}

Backend mapping

integration-api resolves custom-llm in api/integration-api/internal/caller/caller.go. Chat and streaming requests route through api/integration-api/internal/caller/custom_llm. The credential parser validates:
  • baseUrl is present and non-empty.
  • apiCompatibility is a non-empty string when provided.
  • headers is a string map when provided.

LLM overview

Caller interfaces and model options.

Custom STT

Configure custom speech-to-text.