Tutorial 15

Creating and using grammars for recognition.

Instructions
Open the application which accompanies this tutorial. Since the focus is on using the grammar, we will not build the whole application in this tutorial, but simply focus on the grammar. The loaded application should resemble the image below.


So far you have only asked your RAD programs to recognize single words or phrases from a list of choices, with possible garbage or silence at the beginning and end of the utterance. All of these recogniton states have used a lexical tree as their language model. Finite-state grammars are a more flexible and specific language model which are also available for use with RAD.

Run the loaded tutorial application. It is very similar to the pizza demo that comes with RAD. The main difference is that it uses one recognition state with a finite state grammar to recognize the pizza order.

For example the user may respond to "PizzaType" with:

A small pepperoni
A medium vegetarian
A large cheese

To get started, delete "PizzaType", and drag a new generic object into it's place. Rename this object to be "PizzaType". The prompt for this state is "What size and type of pizza do you want?".

Once you've got the state in place and connected, double click on the out port. This will open the vocabulary dialogue.


In the upper left hand corner of vocabulary dialogue box is a check box labeled "Grammar". By default the vocabulary is constrained by a lexical tree. Select "Grammar" to change the language model.

Type "pizza" as the name of the grammar.

A grammar consists of a set of rules and variables that descibe possible utterances. For this state, we begin with three variables. One represents the size of the pizza, one the topping choices, and the third combines all possibilities of the two.

Type the following lines into the Enter grammar box:

$size = small | medium | gigantic;
$topping = vegetarian | pepperoni | cheese;
$pizza = [*sil%% | *any%%] $size $topping [*sil%% | *any%%];

The names of the variables or non-terminals in this grammar are $size, $topping, and $pizza. The string associated with $size is:

small | medium | gigantic

The | character means "or". So either "small" or "medium" or "gigantic" are the words that can be recognized in the grammar at the position where $size occurs in the grammar rule for $pizza. Similarly "vegetarian" or "pepperoni" or "cheese" are words that can be recognized in the position where $topping occurs in the grammar rule for $pizza.

Here are some syntactical rules to keep in mind when constructing simple grammars in RAD:

Character(s) Usage
[ ]
square brackets delimit parts of the grammar which are optional
< >
angle brackets delimit parts of the grammar which can be repeated one or more times
{ }
curly braces delimit parts of the grammar which can be repeated zero or more times
|
represents or
%%
following a word, indicates that word will not appear in recognition resulsts, even if recognized
%
following a word, will substitute the next word into the recognition results if the first word is recognized

*sil and *any are special, built-in features of RAD. *sil is used to recognize silence, and *any is like a garbage/noise collector, used to recognize anything which doesn't match the specified recognition vocabulary (e.g. other words, sneezes, background noise). Thus, this grammar will recognize optional silence or garbage-noise followed by a $size word and a $topping word and ending in optional silence or garbage-noise.

Once you've entered the grammar, select "Extract word". All of the individual words in your grammar will appear in the vocabulary area. Select "Update All" to get pronunciation strings for these words. Notice that RAD uses special, non-phonetic pronunciation strings for *sil and *any. If you were to forget to type the '*' before either of these words in your grammar, then RAD would produce regular pronunciation strings for them and attempt to recognize them literally. The dialogue should look like this:



When you are done, select "OK" to close the vocabulary dialogue box.

Enter the following Tcl Code in the PizzaType object's OnExit tab:

set pizzaSize [lindex $PizzaType(recog) 0]
set pizzaTopping [lindex $PizzaType(recog) 1]
tts "You ordered a $pizzaSize $pizzaTopping pizza."

If recognition is successful, exactly two words have been recognized, because the grammar has forced this constraint. The first two lines of Tcl code assign the two words to individual variables, while the last line uses those two variables to synthesize a confirmation to the user.

Note that the final line (without the tts and the quotation marks) could also be placed directly in a prompt, but you would need an additional state for this purpose. tts is a RAD procedure which can be used in action boxes to cause the TTS engine to pronounce the words within quotes.

Build and run the application. To get some further practice with grammars, consider some of the following modifications and enhancements.

Add the following variable definitions to your PizzaType grammar, so that at any time your customers can ask for help:

$other = help | wait | stop | repeat;

Integrate this as an alternative to the current grammar. Use the table above to help write the grammar. You'll need to handle the help request by checking the recognition results.

Let the customer order drinks, or salads, or thick vs. thin crust, within an expanded version of the grammar. Make this new feature optional so that the program will work either as before, or with the new feature.

Create some new variables and add them to the grammar. Some phrases that you might want to recognize are:

gigantic thin crust pepperoni and a small coke
a large green salad with Italian dressing and a medium vegetarian pizza

Notice that along with order-category variables like $drink or $salad you may also want to define some throw-away variables to cover the prefixes, affixes, and other extraneous words such as articles, conjunctions, and prepositions that your customers might use as they order. For example:

$prefix = a%% | the%%;
$pizza = [*sil%%|*any%%|$prefix] $size [$crust] $topping ...

Recall that %% means to allow a recognition for the preceding terminal but then throw it away, because you don't care to have it in the output of your recognition stage.