Use this file to discover all available pages before exploring further.
Custom TTS lets you connect Rapida to any WebSocket-based speech synthesis service without writing a new Go transformer. The UI stores provider credentials and assistant options; assistant-api reads them through the custom-tts transformer and maps LLM text packets and provider audio frames with WebSocket DSL rules.Provider identifier:custom-tts
In the Rapida dashboard, open Integrations > Models, choose Custom TTS, and create a credential.
Credential field
Required
Value
apiCompatibility
Yes
websocket_v1
baseUrl
Yes
WebSocket URL for your TTS service, for example wss://tts.example.com/v1/speak
headers
No
Header map sent on the WebSocket handshake, for example {"Authorization":"Bearer sk_..."}
The backend also accepts snake case keys: api_compatibility and base_url.
2
Select Custom TTS on the assistant
Open the assistant voice settings and select Custom TTS as the text-to-speech provider.
3
Fill the TTS arguments
Set the voice, model, language, audio format, query parameters, request rules, and response rules. The argument reference below maps one-to-one to the UI fields.
Supported outbound frame types are binary, json, and text.
Add an interrupt rule if your provider needs an explicit cancel/clear message. Without it, queued provider audio can continue after the user starts speaking.
Custom TTS uses a JSON-template DSL with three sections: query parameters, request rules, and response rules. The DSL is intentionally small. It does not run scripts, call functions, concatenate strings, perform regex matching, or read environment variables.
Converts strings, bytes, numbers, booleans, and null to string form.
number
Converts JSON numbers, numeric values, or numeric strings to an integer or float.
boolean
Converts booleans, boolean strings, and numeric values. JSON numbers are accepted as 0 or 1; typed numeric values use zero as false and non-zero as true.
speak.ws.query_params must be a flat JSON object. Each value must resolve to a primitive value: string, number, boolean, or null. Nested objects and arrays are rejected unless the object is a DSL expression.
The rendered query parameters are appended to baseUrl. Existing query parameters in baseUrl are preserved unless the same key is rendered by speak.ws.query_params.
TTS request rules can read packet.text, but text is not a supported query parameter variable.
The connection URL is built from baseUrl and speak.ws.query_params.
Headers are copied from the credential and are not templated.
The transformer opens a connection per active message/context. A new context closes the previous connection.
text packets open the WebSocket connection if needed.
done and interrupt request rules are optional. If no rule exists for that packet, nothing is sent.
On interruption, Rapida sends the optional interrupt rule first, then closes the connection.
Audio returned by the provider is interpreted as speak.audio.encoding and speak.audio.sample_rate, then resampled to Rapida’s internal audio format when needed.
If no response rule matches an inbound frame, the frame is ignored.
If a response emits error, Rapida emits a TTS error packet.
If a response emits done, Rapida closes the connection and emits a TTS end packet.
assistant-api resolves custom-tts in api/assistant-api/internal/transformer/transformer.go, then dispatches to the WebSocket v1 implementation in api/assistant-api/internal/transformer/custom/tts_websocket_v1.The WebSocket v1 transformer validates:
baseUrl is present in the credential.
speak.voice.id is present.
speak.audio.encoding is not empty.
speak.audio.sample_rate is positive.
speak.ws.request_rules contains at least one text packet rule.
speak.ws.response_rules contains at least one rule.