AI Voice Nodes
AI Voice Nodes convert text to speech (TTS), clone voices from audio samples, and transform speech between voices (STS β Speech-to-Speech). They support three input types: a script (text), a reference voice (audio for cloning), and a performance (audio/video for speech transformation and dubbing).
Inputs & Outputs
| Port | Direction | Type | Description |
|---|---|---|---|
| input | In | Text | Script/text from Text Node β the words to speak |
| reference | In | Audio | Voice sample from Audio Node β the voice to clone |
| performance | In | Audio/Video | Source audio or video for speech-to-speech transformation or dubbing |
| output | Out | Audio | Generated speech audio |
Inspector Controls
- Voice Selection β Dropdown to choose from preset voices or use a cloned voice (when Audio Node is connected as reference).
- Generation Mode β TTS (Text-to-Speech from script) or STS (Speech-to-Speech from performance audio).
- Language β Target language for generation.
- Speed β Adjust speech speed (slower for narration, faster for energetic content).
Generation Modes
Text-to-Speech (TTS)
Connect a Text Node with the script. The AI Voice Node generates speech using the selected voice or cloned voice.
- Best for: narration, voiceovers, audiobooks, accessibility
Speech-to-Speech (STS)
Connect an Audio or Video Node as performance. The AI Voice Node transforms the speech into a different voice.
- Best for: dubbing, voice acting, translating spoken content
Voice Cloning
Connect an Audio Node as reference (voice sample) + Text Node as script. The AI Voice Node generates speech in the cloned voice.
- Best for: brand voices, character consistency, personalized content
How to Use
- Add an AI Voice Node to the canvas
- Connect a Text Node (your script) to the
inputport - (Optional) Connect an Audio Node to the
referenceport for voice cloning - (Optional) Connect an Audio/Video Node to the
performanceport for STS - Select voice or let it use the cloned reference
- Click Generate
- Download the resulting audio file
Workflow Examples
Narrated Video: Text Node (βWelcome to our documentary about ocean lifeβ¦β) β AI Voice Node (generates narration) + Text Node β Scene Node (generates matching visuals with audio enabled)
Video Dubbing: Scene Node (original video in English) β AI Voice Node (performance port β transforms speech to French)
Character Voice: Audio Node (10s sample of a voice) β AI Voice Node (reference port) + Text Node (character dialogue) β generates dialogue in the cloned voice
Tips
- For voice cloning, provide 10-30 seconds of clean speech β no background noise, music, or multiple speakers
- STS quality depends on the input audio quality β clear, well-recorded source produces better results
- Use TTS when you have a script, STS when you have existing audio to transform
- For video dubbing, connect the video to the performance port β the AI matches lip movements
- Keep scripts under 500 words per generation for best quality
- Test with short samples before generating long narrations
Troubleshooting
- Voice quality poor: Check that the reference audio sample is clean (no noise, single speaker, 10-30s).
- Wrong language: Make sure the Language setting matches your script. Some voices may not support all languages.
- Generation too slow: Long scripts take longer. Split into shorter segments if needed.
- Audio clipping: Reduce the speed setting or break text into shorter paragraphs.
See Also
- Audio Nodes β Record or upload audio for voice cloning
- Text Nodes β Generate scripts for voice generation
- Scenes β Create videos with AI-generated narration