Voice-Over Documentary Workflow

Create a narrated documentary-style video with AI-generated voiceover and matching visuals — the AI writes the script, generates the voice, and creates the video.

Difficulty

Intermediate · ~20 minutes · Nodes: Text × 2, AI Voice, Scene × 2

Ingredients

  • 1 Text Node — Script generation
  • 1 Text Node — Visual scene descriptions
  • 1 AI Voice Node — Narration generation
  • 2 Scene Nodes — Video generation with audio

Procedure

Step 1: Write the Narration Script

Add a Text Node in Generate mode:

  • Prompt: “Write a 2-paragraph documentary narration about deep ocean life. Professional narrator voice, vivid descriptions, educational tone. Each paragraph should describe a different scene.”
  • Model: Gemini 3 Pro Preview (free, good for long-form)

Step 2: Generate the Voice-Over

Connect the script Text Node → AI Voice Node:

  • Choose a deep, professional narrator voice
  • The AI Voice generates speech from the script text

Step 3: Create Visual Descriptions

Add a second Text Node connected to the script Text Node:

  • Prompt: “Based on this narration script, write detailed visual scene descriptions for each paragraph. Focus on what should appear on screen during each part of the narration.”
  • This creates parallel visual descriptions matching the narration

Step 4: Generate Video Scenes

Add Scene Nodes connected to the visual description Text Node:

  • Scene 1: First paragraph’s visual description
  • Scene 2: Second paragraph’s visual description
  • Enable Audio on scenes (for ambient sounds)
  • Duration: Match approximately to narration timing (8-12s per paragraph)

Step 5: Review and Iterate

Run the workflow. Review:

  1. Does the narration flow well? → Adjust script Text Node prompt
  2. Do visuals match the narration? → Adjust visual description prompts
  3. Is the voice appropriate? → Try a different voice preset

Lock successful elements and regenerate only what needs improvement.

Result

A documentary-style video with AI-generated narration and matching visuals — script, voice, and video all created by AI in one workflow.

Variations

  • Add Audio Node with a voice sample for voice cloning (use your own voice)
  • Use Reference Images for visual consistency across scenes
  • Add Upscaler for higher quality output
  • Connect multiple scenes for longer documentaries
  • Use If/Else to branch narration based on topic analysis

See Also