This lesson is designed primarily for speech researchers. It explains how to improve speech recognition by fine tuning the recognition vocabulary’s phonetic representation. But if you’re feeling adventurous please proceed
The speech recognizers built into RAD use phone-based pronunciation models as a basis for performing recognition. The pronunciation strings are automatically generated when you enter words in the recognition windows. You can view the phonetic pronunciations by selecting Update All in the recognition dialog box before you hit OK. For each word in the vocabulary to be recognized, these sources are queried in order until a pronunciation is found. The sources are listed below:
- Custom dictionary
: This contains speaker-specific pronunciation models produced by the custom pronunciation feature to be described later in this section.
- System dictionary
: By default, RAD uses the CMU (Carnegie-Mellon University) pronunciation dictionary.
- Text-to-speech synthesizer:
If the word is not found in any of the above dictionaries, a text-to-speech synthesizer is used to generate a pronunciation via letter-to-sound rules. This pronunciation is then stored in the local dictionary in case it is needed again.
Creating your own pronunciation models
RAD pronunciations are specified in terms of the Worldbet phonetic symbol set. Phonetic symbols are used to represent how a word sounds. A table of legal Worldbet symbols, including examples, is available via the Help main menu entry in RAD.
Building pronunciation models by hand may look and sound intimidating but the process is actually quite straightforward. It involves selecting phonemes that characterize the individual sounds that make up the word. You will find the examples in the Worldbet table very useful because they describe how each phoneme sounds.
The process of hand-crafting usually starts by seeing whether the system pronunciation is a faithful representation of how application users actually say the word. If it is and there are no obvious alternative pronunciations, then there is no need to change the pronunciation. If it is obviously inaccurate because of a defective dictionary entry, or if you can think of alternative ways of pronouncing the word because of local dialects etc, then you will need to make some changes. There are several ways to do this:
- The pronunciations are displayed in the box headed Pronunciations in any object’s recognition Vocabulary dialogue box. They can be modified directly by typing in a new pronunciation.
A more permanent solution involves saving your new hand-built pronunciation model into the custom dictionary. After entering your alternate pronunciation, simply clicking "Add to Custom" from the recognition Vocabulary dialog box. In the example below, the city Edinburgh is actually pronounced "Edinbruh." We can alter this pronunciation simply by changing the phonetic representation (we changed the ending phoneme to ^) and clicked "Add to Custom" As seen in the RAD help menu, ^ is the "uh" sound.
- You can manually edit the entire custom dictionary from the global preferences window under the Dictionaries tab.
A few examples of increasingly complex pronunciation models for the word "January" are presented below. The following syntactic conventions apply:
- A pronunciation is a list of phonemes delimited by white space. Extra spaces or tabs are ignored. Every phoneme must be matched in order for the word to match.
dZ @ n j u E 9r i:
- Optional phonemes are enclosed in square brackets. The following example can be pronounced with or without the E:
dZ @ n j u [E] 9r i:
- Alternate phonemes are enclosed in curly braces. The following pronunciation permits either @ or E for the vowel in the first syllable:
dZ {@ E} n j u E 9r i:
- Phonemes can be forced into alternative groupings using parentheses. The following example allows two alternate pronunciations in the first syllable, either dZ @ n or dZ E m:
dZ {(@ n) (E m)} j u E 9r i:
The dictionary format contains one entry per line. The first element of the line is the word in lower-case letters (e.g., january above), followed by white space, then a pronunciation. Words can be composed of any combination of printable characters, except that there can be no white space in the middle of the word.
Activity1
The following instructions introduce you to hard-crafting pronunciation models. Use the following words as the recognition vocabulary for your test program. Create a hand-crafted pronunciation model for each, and then run your test program to see if recognition improves.
portland
eighty-eight
Activity2
Below are some common modifications that often improve the accuracy of words models. Perhaps you can identify some more of your own?
- Try making the plosive releases optional when followed by consonants or at the end of a word. Example: change the pronunciation string for portland from:
pc ph > 9r tc th l & n dc d
to:
pc ph > 9r tc [th] l & n dc [d]
- Try changing the alveolar plosives that occur in the middle of a word or phrase to flaps. Example: change the pronunciation string for eighty-eight from:
ei tc th i: [.pau] ei tc th
to:
ei {(tc th) (d_\)} i: [.pau] ei tc [th]
Hopefully, this exercise has given you some hints for getting better recognition results from your
programs.