Tutorial 13

Creating and using natural speech prompts.

Instructions
Drag and arrange states onto the canvas so that you have the following setup.


RAD can substitute any recorded voice for the computer's synthetic speech. The recorded speech can be aligned with the animated agent so that lip, face and body movements are in sync. The process of creating a natural speech prompt involves four steps:

1. Record yourself speaking the prompt.
2. Transcribe your prompt in the text box.
3. Align the audio and text transcription.
4. Save the resulting sound file. (file suffix is .sob for sound object)


Double click on the "natural_speech" state to open it's configuration dialogue. Select the "Recorded" tab to raise the recorded speech tab.


Select "Edit" to edit a new sound object. This will open BaldiSync. BaldiSync allows you to record your voice, enter your transcription, align and save the file.


In the text box, enter the following:

Can I borrow your towel? My car just hit a water buffalo.


Now you must record yourself saying "Can I borrow your towel? My car just hit a water buffalo". In order to record, select the record button, speak, then select the stop button. If your first recording is not satisfactory, you can simply keep re-recording until you are pleased with the quality of the utterance. You should try to keep from having too much trailing silence or noise in the recording (see tips below). Once you've recorded your speech, the display should resemble this:


Some common problems encountered in recording speech for alignment are:



1. Speech is too loud.
Speaking too loudly or close to the mic will produce a scatchy sounding recording. Note that the peaks in the sound energy display are clipped and appear to extend past the boundaries of the display window. Move the microphone further from your mouth and speak more softly.



2. Too much silence.
You should begin speaking immediately after selecting the record button, and stop recording directly after you've stopped speaking. The record button can be selected a second time to stop recording.

An ideal recording will have no clipping and very little leading and trailing silence. It is possible to delete leading and trailing silence from a recording, consult the BaldiSync documentation for details. Below is an example of an ideal recording:





Once you are satisfied with the quality of your recording, select the "Align" button to align your speech with the text that you entered. The computer will generate some phonetic anc word labels that are displayed below the wave. Select "Animate" for a preview of how your speech will be synchronized with the animated agent. The alignment boundaries are adjustable, details can be found in the BaldiSync documentation.



Select "Save Sob" from the file menu to save this wave and alignment. Save the sob file within the same directory as your RAD application. This makes it easy to transport your RAD application to other computers. BaldiSync will close after you save the sob.


Note that the name of the sob file now appears in the "File" field of the "Recorded" tab. Make sure that "recorded" is selected for the prompt type, then select "Ok" to close the dialogue.