Audio Nodes

Audio Nodes capture or upload audio content for multimodal AI processing. They bridge the gap between voice and text — connect an Audio Node to a Text Node, and the AI can transcribe, summarize, or analyze the audio content. Audio Nodes also serve as voice references for AI Voice Nodes, enabling voice cloning.

What is an Audio Node?

An Audio Node is the voice input layer of your workflow. It lets you upload an audio file or record directly from your microphone, then feed that audio into downstream nodes for processing. Connect it to a Text Node and the AI will “hear” the audio — transcribing speech, summarizing meetings, or analyzing spoken content. Connect it to an AI Voice Node and it becomes a voice reference for cloning.

Inputs & Outputs

PortDirectionTypeDescription
(none)Input—No input connections — audio is uploaded or recorded directly
OutputOutputAudioConnects to Text Node (transcription/analysis), AI Voice Node (voice cloning reference), AI If/Else, Canvas

Inspector Controls

Upload Audio

Click to upload an audio file from your computer. Supported formats: MP3, WAV, WEBM, OGG, M4A. The file is uploaded to cloud storage and a waveform visualization appears in the inspector.

Record Mic

Record audio directly from your browser’s microphone. Click the button, grant microphone permissions when prompted, speak, then click Stop when done. The recording is saved automatically.

Audio Playback

A waveform visualization with play/pause controls. Review your audio before connecting it to other nodes.

Delete Audio

Remove the current audio and start fresh. Click the delete button to clear the uploaded or recorded audio.

Supported Formats

FormatExtensionNotes
MP3.mp3Most common, good compression
WAV.wavUncompressed, highest quality
WebM.webmWeb-native format
OGG.oggOpen source format
M4A.m4aApple format, good quality

How to Use

  1. Add an Audio Node to the canvas by right-clicking and selecting “Audio Node” from the context menu, or by dragging it from the sidebar.
  2. Upload an audio file or record from your microphone using the inspector controls.
  3. Preview the audio using the waveform player to verify it captured correctly.
  4. Connect the output to a Text Node for transcription or analysis, or to an AI Voice Node for voice cloning.
  5. Write a prompt in the connected Text Node describing what you want (e.g., “Transcribe this audio” or “Summarize the key points”).
  6. Generate — the AI listens to the audio and responds based on your text prompt.

Workflow Examples

Meeting Summarization

Audio Node (upload meeting recording) connects to a Text Node with the prompt “Summarize the key points and action items from this meeting.” The AI listens to the full recording and generates a structured summary with action items.

Voice-to-Video Pipeline

Audio Node (record narration) connects to a Text Node with the prompt “Write a visual scene description based on this narration.” The Text Node output then connects to a Scene Node, which generates a video matching the narration.

Voice Cloning Reference

Audio Node (upload voice sample) connects to an AI Voice Node. The AI Voice Node uses the audio as a voice reference for cloning — any text sent to the AI Voice Node will be spoken in the same voice as the sample.

Tips & Best Practices

  • For transcription, keep the prompt simple: “Transcribe this audio word for word.”
  • For analysis, be specific: “List the main arguments in this podcast segment” or “Identify the speakers and summarize each person’s contributions.”
  • Recording quality matters — use a quiet environment, speak clearly, and minimize background noise for best results.
  • For voice cloning with AI Voice Nodes, provide 10-30 seconds of clean speech. Avoid background noise, music, or multiple speakers in the sample.
  • Audio files should be under 25MB for reliable processing.
  • The AI processes the full audio — there is no need to trim it precisely before uploading.

Troubleshooting

Microphone not working

Check your browser permissions under Settings then Privacy then Microphone. Make sure you have granted microphone access to the site. Try refreshing the page after granting permissions.

Upload fails

Verify the file format is supported (MP3, WAV, WEBM, OGG, M4A). Check that the file size is under 25MB. Try converting the file to MP3 if the format is not recognized.

Transcription inaccurate

Audio quality is the main factor. Background noise, multiple overlapping speakers, or low volume cause issues. Re-record in a quiet environment or use a higher-quality audio file.

No audio playback

Try a different browser (Chrome is recommended). Some audio formats may not play natively in all browsers. Converting to MP3 usually resolves playback issues.

See Also

  • Text Nodes — Analyze or transcribe audio content
  • AI Voice Nodes — Use audio as a voice cloning reference
  • Scenes — Create videos from audio-driven workflows