| |
Tutorial 19
Improving Recognition
Adjusting Speech Recognition Parameters
In this exercise you will become familiar with the audio and recognition parameters, listed in RAD’s Preferences menu. By experimentation you will gain a sense for how the adjustment of these parameters can affect your application's recognition performance.
We will create a small system that you can use to test the effects of adjusting various RAD parameters.

Building this Application
- Add a Generic Object with a prompt "What kind of Pizza would you like?" and some simple recognition vocabulary (like vegetarian, mushroom, pepperoni, etc.) entered in the Vocabulary box of it’s output port
- Add another Generic Object with a prompt "Again?" and two output ports ---one to recognize "yes," with a path leading back to the previous state, and the other recognizing "no," with a path leading to the last state
3. Turn "Repair Mode" on in the main preferences window
- Finally, add a Goodbye Object.
Perform the following tests in "Question" object’s Preferences Window under the Recognition tab

- When your test program is ready, adjust the out-of-vocabulary rejection median (OVRM) You will find this setting by double clicking the object named "question" and selecting the Recognition tab. The changes we make to speech recognition will only effect this object. To make global changes to the speech recognition parameters, use the File-preferences-recog/DTMF menu.
- To enable the use of the recognition sliders, click the checkbox in the upper left corner of the recognition tab labeled "Recognizer"
- Set the (OVRM) slider to its highest possible setting (Reject less) and run your program. How was the recognition?
- Try it again and be sure to say some words that are not in the recognition vocabulary. Did your program think that the out-of-vocabulary words were words in its recognition vocabulary? If so, this is because you made the setting more forgiving by "rejecting less."
- Now adjust the out-of-vocabulary rejection median to its lowest setting (Reject more) and run your program again. Were any of the words you said recognized? If the OVRM is too low, even words within the recognition vocabulary may still be rejected.
- Take a look at the lines labeled araw(garbage,1) and araw(word,1)in the console window output of your program. To view the console window, select View -> Console. Is araw(garbage,1)always the highest score?
- How often is the word or phrase that you actually said the first one listed in the Recognition Results dialogue box? (If you can't see the Recognition Results dialogue box it can be opened from the menu option View->Recognition Results in the main menu bar.
- Adjust the out-of-vocabulary rejection, median (rank) until when you run your program it recognizes the in-vocabulary words that you say but rejects the out-of-vocabulary ones. At what value is the parameter set when that occurs? Is it the same as the default value?
- Try using different speech recognizers. Try the children’s speech recognizer

Kal, what about making a link the anchor within the preferences menu docs for each of the sliders listed below.??????
To enable the use of the audio sliders, click the checkbox in the upper left corner of the audio tab labeled "Audio Parameters"
- Try the same type of experiment with the other parameters listed below, which are all accessible via the "question" object’s Preferences menu - Misc tab. By running your program with each parameter at both its lowest and highest settings you should be able to answer the question posed about that parameter:
Maximum Record Duration - When would this parameter's setting have some effect on the recognition performance of your program?
Leading Silence Duration - Can this parameter be placed at some setting that would make it impossible for your program to recognize anything?
Trailing Silence Duration -What happens if your program must recognize multi-word phrases like "small vegetarian pizza," and this parameter's value is very small --- meaning that only a very few milliseconds of trailing silence are required for your program to think that it has heard a complete utterance. See Tutorial 12 for an example of adjusting the Trailing Silence Duration.
Voice Detection Threshold (Telephone/Microphone) - This parameter is adjusted for you when you calibrate your audio device; however, you can also adjust it manually. What happens if you set this parameter to a value near zero? Do you suppose that your program might try to interpret the door opening in the background as a mushroom pizza order? In a noisy room, do think that it would help recognition to have this setting be somewhat higher than usual?
Record Backoff - What do your the waveforms of your recognition results look like in the Speech Viewer Tool when this parameter is set to a very small number? A very large number?
Global Preferences vs. Object Preferences
By using the object’s preferences menu, the changes we made to speech recognition only effect the one object. To make global speech recognition changes, use the File-Preferences window. The settings within this menu will be in effect for the entire application.
An individual object can override the global settings when the enabling checkbox is selected within the object’s preferences menu. The enabling checkboxes is what you clicked in this lesson to cause the sliders to be movable and more visible.
|