Baldi Sync

Overview
Baldi Sync is a tool which you can use to view and create facial animation that is aligned with recorded audio. You can use either your own recorded voice or text-to-speech as the audio source. Baldi Sync can read and write wave files, as well as it's own special files called sound objects (sobs). The program's interface is show below:


The Basics
To make Baldi speak with the default TTS voice, do the following:
  1. Type the text you wish him to speak into the text box.
  2. Click TTS to generate the audio.
  3. Wait for the wave and alignment to appear.
  4. Click Animate to see Baldi say the text.
To make him speak with your recorded voice:
  1. Click on the record button, say your utterance, and stop recording.
  2. Enter the text of what you said into the text box.
  3. Click Align.
  4. Wait until the alignment appears.
  5. Click Animate to see Baldi speak with your voice.
Through the File menu, you can choose to save the audio and alignment which you have created as a sob file. This will allow you to use this information later in other applications like RAD. You can also load in sob files that have been created previously in order to edit them. By importing wave files, you can bring previously recorded audio into Baldi Sync for alignment with Baldi.

The imported wave files can only be recorded in mono. Currently, the Toolkit does not support stereo wave files.
Adjusting the Alignment
Once an alignemnet has been generated or loaded, it is possible to adjust the boundaries of the words and phonemes. This will affect the animation, since it is driven by a series of time-aligned phonemes and words. The word boundaries will only affect some secondary animation characteristics, like blinking and eyebrows. The phoneme boundaries are the most relevant boundaries for button.