Documentation Index Fetch the complete documentation index at: https://doc.rapida.ai/llms.txt
Use this file to discover all available pages before exploring further.
Custom STT lets you connect Rapida to any WebSocket-based transcription service without writing a new Go transformer. The UI stores provider credentials and assistant options; assistant-api reads them through the custom-stt transformer and maps audio packets and provider responses with WebSocket DSL rules.
Provider identifier: custom-stt
Create the provider credential
In the Rapida dashboard, open Integrations > Models , choose Custom STT , and create a credential. Credential field Required Value apiCompatibilityYes websocket_v1baseUrlYes WebSocket URL for your STT service, for example wss://stt.example.com/v1/listen headersNo Header map sent on the WebSocket handshake, for example {"Authorization":"Bearer sk_..."}
The backend also accepts snake case keys: api_compatibility and base_url.
Select Custom STT on the assistant
Open the assistant voice settings and select Custom STT as the speech-to-text provider.
Fill the STT arguments
Set the model, language, audio format, query parameters, request rules, and response rules. The argument reference below maps one-to-one to the UI fields.
STT arguments
The UI stores these keys with the microphone. metadata prefix, but the transformer reads the option keys shown below.
Option key Required Default Description listen.modelNo Empty Provider model identifier. Reference it in DSL as model or config.model. listen.languageNo Empty Provider language code. Reference it in DSL as language or config.language. listen.audio.encodingYes LINEAR16Audio encoding sent to the provider. Supported values: LINEAR16, MuLaw8. listen.audio.sample_rateYes 16000Audio sample rate sent to the provider. Supported UI values: 8000, 16000, 22050, 24000, 32000, 44100, 48000. listen.ws.query_paramsNo {}Flat JSON object appended to baseUrl as query parameters. Values can use DSL expressions. listen.ws.request_rulesYes See example below Ordered JSON array that tells Rapida what to send for each outbound packet. Must contain at least one audio rule. listen.ws.response_rulesYes None Ordered JSON array that tells Rapida how to parse provider WebSocket frames into transcripts or errors.
Query parameters
Use listen.ws.query_params when your provider expects configuration in the WebSocket URL.
Supported variables:
Variable Source modellisten.modellanguagelisten.languageencodinglisten.audio.encodingsample_ratelisten.audio.sample_rate
{
"language" : { "$var" : "language" },
"model" : { "$var" : "model" },
"encoding" : { "$var" : "encoding" },
"sample_rate" : {
"$cast" : "number" ,
"value" : { "$var" : "sample_rate" }
}
}
Request rules
Request rules are evaluated for normalized packets produced by Rapida.
Packet When it is sent Available paths turn_changeA new turn/context starts packet.kind, packet.context_id, config.model, config.language, config.audio.encoding, config.audio.sample_rateaudioAudio is ready to stream packet.kind, packet.context_id, packet.audio.bytes, packet.audio.base64, config.*interruptUser interruption is detected packet.kind, packet.context_id, config.*
Supported outbound frame types are binary, json, and text.
packet.audio.bytes is raw audio bytes for binary frames. Use packet.audio.base64 when the provider expects audio inside a JSON payload.
Binary audio stream
[
{
"when" : { "packet" : "audio" },
"send" : {
"frame" : "binary" ,
"body" : { "$path" : "packet.audio.bytes" }
}
}
]
JSON audio payload
[
{
"when" : { "packet" : "audio" },
"send" : {
"frame" : "json" ,
"body" : {
"audio" : { "$path" : "packet.audio.base64" },
"encoding" : { "$path" : "config.audio.encoding" },
"sample_rate" : {
"$cast" : "number" ,
"value" : { "$path" : "config.audio.sample_rate" }
}
}
}
}
]
Response rules
Response rules parse provider WebSocket frames into Rapida transcript packets.
Supported inbound frame types:
Frame Use when jsonThe provider returns JSON messages. textThe provider returns plain transcript text.
Supported emit keys:
Emit key Type Description scriptstringTranscript text. When present, Rapida emits an STT transcript packet. confidencenumberOptional transcript confidence. languagestringOptional language code for the transcript. interimbooleantrue for partial transcripts, false for final transcripts.errorAny Provider error value. When present, Rapida treats the frame as an error.
[
{
"when" : { "frame" : "json" , "path" : "type" , "equals" : "partial" },
"emit" : {
"script" : { "$path" : "text" },
"confidence" : {
"$cast" : "number" ,
"value" : { "$path" : "confidence" }
},
"language" : { "$path" : "language" },
"interim" : true
}
},
{
"when" : { "frame" : "json" , "path" : "type" , "equals" : "final" },
"emit" : {
"script" : { "$path" : "text" },
"confidence" : {
"$cast" : "number" ,
"value" : { "$path" : "confidence" }
},
"language" : { "$path" : "language" },
"interim" : false
}
},
{
"when" : { "frame" : "json" , "path" : "type" , "equals" : "error" },
"emit" : {
"error" : { "$path" : "error.message" }
}
}
]
STT DSL design
Custom STT uses a JSON-template DSL with three sections: query parameters, request rules, and response rules. The DSL is intentionally small. It does not run scripts, call functions, concatenate strings, perform regex matching, or read environment variables.
STT frame support
Frame type Outbound request Inbound response Notes binaryYes No Use for raw provider audio input. jsonYes Yes Use for provider control packets and structured transcripts. textYes Yes Use for providers that accept or return plain text frames.
Inbound parsing rules:
WebSocket message type 2 is treated as binary, but STT response rules do not support binary response frames.
Non-binary messages are parsed as JSON when they contain exactly one valid JSON value.
Non-JSON messages are treated as text.
For STT, JSON string primitives and non-object JSON values are treated as text; JSON response rules operate on JSON objects.
STT operators
Every operator object must contain only that operator and its required field.
Operator Where supported Description $varQuery parameters Reads model, language, encoding, or sample_rate. $pathRequest rules, response rules Reads a dot path from request scope or a JSON response frame. $castQuery parameters, request rules, response rules Casts to string, number, or boolean. $frameResponse rules Reads the full current text response frame.
Unsupported STT operators:
$decode is not supported for STT.
$frame: "binary" and $frame: "json" are not supported for STT emit rules.
Cast behavior
Cast Behavior stringConverts strings, bytes, numbers, booleans, and null to string form. numberConverts JSON numbers, numeric values, or numeric strings to an integer or float. booleanConverts booleans, boolean strings, and numeric values. JSON numbers are accepted as 0 or 1; typed numeric values use zero as false and non-zero as true.
JSON path behavior
$path uses dot-separated paths.
{ "$path" : "packet.audio.base64" }
Objects are traversed by key. Arrays are traversed by numeric index.
{ "$path" : "results.0.transcript" }
Limits:
Keys containing a literal dot are not addressable.
Request rules can only read from config and packet.
Response rules can use $path only with JSON response frames.
A missing path in when.path means the rule does not match.
A missing path in emit or send.body is an error.
Query parameter rules
listen.ws.query_params must be a flat JSON object. Each value must resolve to a primitive value: string, number, boolean, or null. Nested objects and arrays are rejected unless the object is a DSL expression.
{
"language" : { "$var" : "language" },
"model" : { "$var" : "model" },
"sample_rate" : {
"$cast" : "number" ,
"value" : { "$var" : "sample_rate" }
}
}
The rendered query parameters are appended to baseUrl. Existing query parameters in baseUrl are preserved unless the same key is rendered by listen.ws.query_params.
Request rule shape
listen.ws.request_rules is an ordered JSON array. Every matching rule is sent, so one packet can produce multiple WebSocket messages.
Field Required Description when.packetYes turn_change, audio, or interrupt.send.frameYes binary, json, or text.send.bodyYes Static value or DSL expression tree.
Body validation depends on send.frame:
send.framesend.body must resolve tobinaryBytes or string. jsonValid JSON value. Byte arrays are not valid JSON bodies. textValue convertible to string.
STT request scope
The request scope is the data available to $path in request rules.
{
"config" : {
"model" : "model-a" ,
"language" : "en-US" ,
"audio" : {
"encoding" : "LINEAR16" ,
"sample_rate" : 16000
}
},
"packet" : {
"kind" : "audio" ,
"context_id" : "ctx_123" ,
"audio" : {
"bytes" : "<raw bytes>" ,
"base64" : "AAE="
}
}
}
packet.audio exists only for audio packets.
Response rule shape
listen.ws.response_rules is an ordered JSON array. The first matching rule is evaluated; later rules are skipped for that frame.
Field Required Description when.frameYes json or text.when.pathNo Dot path inside a JSON frame. Must be paired with when.equals. when.equalsNo Primitive value compared against when.path, or against the full text frame. emitYes Object containing supported STT emit keys.
Match behavior:
Frame Match behavior jsonIf when.path and when.equals are omitted, matches any JSON object. If provided, both fields are required and compared exactly. textIf when.equals is omitted, matches any text frame. If provided, compares the full text frame exactly. when.path is not allowed.
STT emit keys
Emit key Type after evaluation Effect scriptstring Transcript text. Empty transcripts are ignored. confidencenumber Transcript confidence. Defaults to 0 when omitted. languagestring Transcript language. Falls back to listen.language when omitted. interimboolean true emits an interim transcript; false emits a completed transcript.errorstring Emits an STT error instead of a transcript.
Plain text transcript response
Use this for providers that return transcript chunks as raw text frames.
[
{
"when" : { "frame" : "text" },
"emit" : {
"script" : { "$frame" : "text" },
"interim" : false
}
}
]
Nested JSON transcript response
[
{
"when" : { "frame" : "json" , "path" : "result.final" , "equals" : false },
"emit" : {
"script" : { "$path" : "result.transcript" },
"interim" : true
}
},
{
"when" : { "frame" : "json" , "path" : "result.final" , "equals" : true },
"emit" : {
"script" : { "$path" : "result.transcript" },
"confidence" : {
"$cast" : "number" ,
"value" : { "$path" : "result.confidence" }
},
"language" : { "$path" : "result.language" },
"interim" : false
}
}
]
Start, audio, and interrupt recipe
Use this pattern when the provider expects a session-start message, binary audio frames, and a flush message on interruption.
[
{
"when" : { "packet" : "turn_change" },
"send" : {
"frame" : "json" ,
"body" : {
"type" : "start" ,
"language" : { "$path" : "config.language" },
"sample_rate" : {
"$cast" : "number" ,
"value" : { "$path" : "config.audio.sample_rate" }
}
}
}
},
{
"when" : { "packet" : "audio" },
"send" : {
"frame" : "binary" ,
"body" : { "$path" : "packet.audio.bytes" }
}
},
{
"when" : { "packet" : "interrupt" },
"send" : {
"frame" : "json" ,
"body" : { "type" : "flush" }
}
}
]
Runtime behavior
The connection URL is built from baseUrl and listen.ws.query_params.
Headers are copied from the credential and are not templated.
Audio is resampled from Rapida’s internal audio format to listen.audio.encoding and listen.audio.sample_rate before request rules are evaluated.
turn_change and audio packets open the WebSocket connection if needed.
interrupt rules are sent only when a connection is already active.
If no response rule matches an inbound frame, the frame is ignored.
If a response emits error, Rapida emits an STT error packet.
If a response emits non-empty script, Rapida emits a transcript packet and conversation event.
Current STT limits
No regex, contains, starts-with, greater-than, or compound match conditions.
No string interpolation or concatenation.
No fallback values inside expressions.
No dynamic headers or dynamic WebSocket path segments.
No $decode.
No binary response handling for STT.
No $frame: "json" selector in emit rules.
Backend mapping
assistant-api resolves custom-stt in api/assistant-api/internal/transformer/transformer.go, then dispatches to the WebSocket v1 implementation in api/assistant-api/internal/transformer/custom/stt_websocket_v1.
The WebSocket v1 transformer validates:
baseUrl is present in the credential.
listen.audio.encoding is not empty.
listen.audio.sample_rate is positive.
listen.ws.request_rules contains at least one audio packet rule.
listen.ws.response_rules contains at least one rule.
Custom TTS Configure WebSocket text-to-speech.
STT overview Transformer interface and supported providers.