AI Voice Nodes

AI Voice Nodes convert text to speech (TTS), clone voices from audio samples, and transform speech between voices (STS β€” Speech-to-Speech). They support three input types: a script (text), a reference voice (audio for cloning), and a performance (audio/video for speech transformation and dubbing).

Inputs & Outputs

PortDirectionTypeDescription
inputInTextScript/text from Text Node β€” the words to speak
referenceInAudioVoice sample from Audio Node β€” the voice to clone
performanceInAudio/VideoSource audio or video for speech-to-speech transformation or dubbing
outputOutAudioGenerated speech audio

Inspector Controls

  1. Voice Selection β€” Dropdown to choose from preset voices or use a cloned voice (when Audio Node is connected as reference).
  2. Generation Mode β€” TTS (Text-to-Speech from script) or STS (Speech-to-Speech from performance audio).
  3. Language β€” Target language for generation.
  4. Speed β€” Adjust speech speed (slower for narration, faster for energetic content).

Generation Modes

Text-to-Speech (TTS)

Connect a Text Node with the script. The AI Voice Node generates speech using the selected voice or cloned voice.

  • Best for: narration, voiceovers, audiobooks, accessibility

Speech-to-Speech (STS)

Connect an Audio or Video Node as performance. The AI Voice Node transforms the speech into a different voice.

  • Best for: dubbing, voice acting, translating spoken content

Voice Cloning

Connect an Audio Node as reference (voice sample) + Text Node as script. The AI Voice Node generates speech in the cloned voice.

  • Best for: brand voices, character consistency, personalized content

How to Use

  1. Add an AI Voice Node to the canvas
  2. Connect a Text Node (your script) to the input port
  3. (Optional) Connect an Audio Node to the reference port for voice cloning
  4. (Optional) Connect an Audio/Video Node to the performance port for STS
  5. Select voice or let it use the cloned reference
  6. Click Generate
  7. Download the resulting audio file

Workflow Examples

Narrated Video: Text Node (β€œWelcome to our documentary about ocean lifeβ€¦β€œ) β†’ AI Voice Node (generates narration) + Text Node β†’ Scene Node (generates matching visuals with audio enabled)

Video Dubbing: Scene Node (original video in English) β†’ AI Voice Node (performance port β€” transforms speech to French)

Character Voice: Audio Node (10s sample of a voice) β†’ AI Voice Node (reference port) + Text Node (character dialogue) β†’ generates dialogue in the cloned voice

Tips

  • For voice cloning, provide 10-30 seconds of clean speech β€” no background noise, music, or multiple speakers
  • STS quality depends on the input audio quality β€” clear, well-recorded source produces better results
  • Use TTS when you have a script, STS when you have existing audio to transform
  • For video dubbing, connect the video to the performance port β€” the AI matches lip movements
  • Keep scripts under 500 words per generation for best quality
  • Test with short samples before generating long narrations

Troubleshooting

  • Voice quality poor: Check that the reference audio sample is clean (no noise, single speaker, 10-30s).
  • Wrong language: Make sure the Language setting matches your script. Some voices may not support all languages.
  • Generation too slow: Long scripts take longer. Split into shorter segments if needed.
  • Audio clipping: Reduce the speed setting or break text into shorter paragraphs.

See Also

  • Audio Nodes β€” Record or upload audio for voice cloning
  • Text Nodes β€” Generate scripts for voice generation
  • Scenes β€” Create videos with AI-generated narration