Replicate
Orate supports Replicate's speech and transcription models.
Replicate makes machine learning as easy to use as any other type of software. It has a library of hundreds of open-source models that you can run with a few lines of code. And if you’re building your own machine learning models, Replicate makes it easy to deploy them at scale. Orate supports any Replicate model.
Setup
The Replicate provider is available by default in Orate. To import it, you can use the following code:
Configuration
The Replicate provider looks for the REPLICATE_API_TOKEN
environment variable. This variable is required for the provider to work. Simply add the following to your .env
file:
Usage
The Replicate provider provides a single interface for all of Replicate's speech and transcription models.
Text to Speech
The Replicate provider provides a tts
function that allows you to create a text-to-speech synthesis function using Replicate's TTS models. The Replicate provider is unique in the fact that it requires both an inputTransformer
and an outputTransformer
to be passed to the tts
function.
This is because every model on Replicate has a different input and output format. We'll cover how to create these transformers in the next section. For example, the following code creates a text-to-speech synthesis function using the jaaari/kokoro-82m
model.
Input Transformer
The inputTransformer
is a function that takes the Orate input and transforms it into the format that the model expects. In this instance, jaaari/kokoro-82m
expects an object with a text
property. Let's create an input transformer for this:
Output Transformer
The outputTransformer
is a function that takes the model output and transforms it into the format that Orate expects. In this instance, jaaari/kokoro-82m
returns a stream of audio data. Let's create an output transformer for this:
Speech to Text
The Replicate provider provides a stt
function that allows you to create a speech-to-text transcription function using any Replicate speech-to-text model.
Again, the Replicate provider is unique in the fact that it requires both an inputTransformer
and an outputTransformer
to be passed to the stt
function. We'll cover how to create these transformers in the next section.
For example, the following code creates a speech-to-text transcription function using the vaibhavs10/incredibly-fast-whisper
model.
Input Transformer
The inputTransformer
is a function that takes the Orate input and transforms it into the format that the model expects. In this instance, vaibhavs10/incredibly-fast-whisper
expects an object with an input.audio
property. Also, the input.audio
property must be a hosted URL. Let's create an input transformer for this:
Output Transformer
The outputTransformer
is a function that takes the model output and transforms it into the format that Orate expects. In this instance, vaibhavs10/incredibly-fast-whisper
returns a string. Let's create an output transformer for this:
Speech Isolation
The Replicate provider provides a isl
function that allows you to create a speech isolation function using Replicate's speech isolation model.
Input Transformer
The inputTransformer
is a function that takes the Orate input and transforms it into the format that the model expects. In this instance, cjwbw/audiosep
expects an object with an audio_file
property and a text
property that defines the sound to isolate. Also, the audio_file
property must be a hosted URL. Let's create an input transformer for this:
Output Transformer
The outputTransformer
is a function that takes the model output and transforms it into the format that Orate expects. In this instance, cjwbw/audiosep
returns a stream of audio data. Let's create an output transformer for this: