CSLU Toolkit Formats


Grammar Format
Version: 1.0
Created:
25 May 2003
Modified:
11 October 2005

Overview
The Grammar format specifies grammars for speech recognition.    


Synopsis
FILE-BASED GRAMMAR
Tcl-LIST BASED GRAMMAR
token = rule ;
token = rule
lexicon <URI> ;
lexicon <URI>
oneLevelExpansion oneLevelValue ;
oneLevelExpansion oneLevelValue
// comment
--- comments are not allowed ---
/* comment */
--- comments are not allowed ---
clusterName := token1  token2  ...  tokenN ;
clusterName := token1  token2  ...  tokenN
token -> rule :: context __ context ;
token -> rule :: context __ context
--- blank lines ARE allowed --- --- blank lines are NOT allowed ---

Variables
Variable
Description
token
Any string of characters not including double-quotes.  If the token contains whitespace, then double quotes must surround the token.  Whitespace at the beginning and ending of the token is ignored.  The special characters described below may still be used in a token if they are preceded by a backslash.
rule
A sequence of token names and symbols that define how the token is to be expanded.  Valid symbols are ( and ) for grouping purposes; | to delimit the "or" operator between two tokens; [ and ] to indicate that whatever is within the brackets is optional; and < and > to identify a repeat operator.  Valid repeat operators are <+> and <1-> (indicating that the previous item is to be repeated one or more times), as well as <*> and <0-> (indicating that the previous item is to be repeated zero or more times).  A space does not have to separate the token from its repeat operator.
URI
A uniform resource locator identifying the lexicon file.  Currently, only local URIs are supported.
comment
Any text.  A comment defined by // is terminated by the end of the line, and a comment defined by /* is terminated by */.  Comments are not allowed when the grammar is specified using a Tcl list.
clusterName
A cluster name identifes a group of tokens.  This group of tokens can then be represented by clusterName in a context description.
context
A context string is a set of one or more tokens (or cluster names) that identifies the left or right context of a rule.  The symbols __ (two adjacent underscore symbols) identify the center of the context.  If a context is "anything" (to the left or right of the center of the context), then the context string may be omitted.
oneLevelValue If oneLevelValue is 0 (the default), then rules are continuously expanded until no more rules can be applied.  If oneLevelValue is 1, then only one "level" of rules is applied to a token within a single call to statenet create or statenet add.  For example, consider the case in which a lexicon contains the words "I" and "did", the pronunciation of "I" is (Worldbet) aI, and the pronunciation of "did" is dc d I dc [d].  When expanding a grammar, all occurrences of "I" will be expanded to aI, and all occurrences of "did" will be expanded to dc d I dc [d].  If oneLevelValue is 0 (default), then the I in dc d I dc [d] will be further expanded with the rule I -> aI, yielding dc d aI dc [d].  If oneLevelValue is 1, then this second "level" of applying rules is not performed, and the pronunciation of "did" remains dc d I dc [d].  Because lexicons in general only require one level of applying rules, the default for a lexicon grammar that is specified using the lexicon keyword within a higher-level grammar is that oneLevelValue set to 1.

Keywords and Reserved Symbols
Keyword or Symbol
Description
/BOU
When this keyword occurs in a context string or cluster, it identifies the beginning of the utterance.  It is important to identify /BOU as the left context of every state that may begin the grammar.  This may be done easily in most cases by putting the /BOU keyword in the same cluster as the silence model.
/EOU
When this keyword occurs in a context string or cluster, it identifies the end of the utterance.  It is important to identify /EOU as the right context of every state that may end the grammar.  This may be done easily in most cases by putting the /EOU keyword in the same cluster as the silence model.
lexicon
The lexicon keyword at the beginning of a line, followed by a space, followed by a URI that is surrounded by angle brackets is reserved for defining a pronunciation dictionary of words.  If this keyword is encountered, the pronunciations for the words specified in the grammar will be obtained from the file indicated by the URI.
oneLevelExpansion
The oneLevelExpansion keyword at the beginning of a line, followed by a space, followed by either 0 or 1, is reserved for specifying how many levels of rule expansion are to be applied (one level or many levels).  The default for oneLevelExpansion is 0, meaning that rules are applied until expansion is no longer possible.  See comments in the description of the variable oneLevelValue, above.
=
The equals sign defines the boundary between a token name and its rule.
;
A semicolon defines the end of a rule in the file-based form of a grammar.  In the Tcl-list based form of a grammar, the end of a rule is defined by the end of the list item.
< ... >
Angle brackets have two special uses.  First, if a URI is expected (based on a preceding keyword), then the brackets surround the URI.  Second, in a rule, brackets may specify that the preceding item is to be repeated a number of times.  (In this case, the brackets and contents within the brackets are called a "repeat operator".)   If the symbol inside the brackets is "*" or "0-", then the repetition will occur zero or more times.  If the symbol inside the brackets is "+" or "1-", then the repetition will occur one or more times.  Other types of repeat operators in ABNF form are not yet supported.  If the symbols < ... > are not consistent with a repeat operator, then the entire token is treated as any other token.  If the symbols are consistent with a repeat operator, then a space does not have to separate the repeat operator from the previous item.  (For example, "$word<+>" and "$word <+>" are equivalent).
//
When these characters occur in sequence in a grammar file, they define the beginning of a comment.  The comment ends when the end of the line is reached.  Anything within the comment is ignored by the Statenet package.  If the grammar is specified using Tcl lists, comments are not allowed.
/*  ... */
These characters define a comment in a grammar file.  The comment begins with /* and ends with */.  Anything within the comment is ignored by the Statenet package.  If the grammar is specified using Tcl lists, comments are not allowed.
:=
These symbols in sequence define the separation between a cluster name and a sequence of tokens that are to be clustered.
->
These symbols in sequence define the boundary between a token name and a context-dependent rule.
::
These symbols in sequence define the boundary between a context-dependent rule and the context in which the rule is applied.
__
These symbols in sequence define the boundary between the left and right context in a context-dependent rule.
( ... )
Rounded parentheses define a grouping of tokens within a rule.  Parentheses are useful if more than one word is to be repeated using the repeat operator <*> or <+>, or for grouping tokens with the "or" operator.
|
The "or" operator specifies that the token (or grouping) to the left of the | and the token (or grouping) to the right of the | have a parallel structure in the state network.  As an example of grouping, the grammar "one two | three" is equivalent to "one (two | three)", and different from "(one two) | three".
[ ... ]
Square brackets identify whatever is within the brackets as optional.
\
The backslash may be used in front of any special character to prevent the parsing interpretations.  For example, the SAMPA symbol for the vowel in the word "bat" is {.  This symbol is reserved in ABNF form.  To ensure that this symbol will be treated as a token and not parsed, it may be preceded by a backslash, e.g. \{.
$
The dollar sign is NOT reserved.  However, it is considered good form to use token names that begin with a dollar sign at the level of the top grammar.


Description
The Grammar format specifies grammars for speech recognition using the CSLU Toolkit.  This format is a modified version of the Augmented Backus-Naur Form (ABNF) developed for automatic speech recognition.  This modified format is read by the Statenet package and is used to create state networks for Hidden-Markov Model speech recognizers.  This format may be specified in a file or in a Tcl list; the format is slightly different in each case.  In particular, the Tcl list format does not allow comments, and the end of a rule, cluster assignment, or lexicon specification is specified by the end of the list item rather than by a semicolon.

The use of context-dependent rules allows context-sensitive grammars to be specified.  ABNF is typically context-free, and so the ability to specify context-sensitive grammars is a significant modification to ABNF that increases the range of grammars that can be specified.  In particular, words or phonemes may take on certain forms depending on their context, which is a phenomenon often observed in speech.


Examples
The following example specifies continuous digit recognition at the level of the grammar, the lexicon (in Worldbet notation), and the context-dependent categories, in three separate files.  The context-dependent rules shown here are only a fraction of the entire set of rules for this recognizer.  The last box illustrates how the grammar files are read by the Statenet package.
 
digit.grammar:
$digit   = zero | oh | one | two | three | four | five | six |
           seven | eight | nine;
$grammar = [separator%%] ($digit [separator%%])<+> [separator%%];
digit.lexicon:
zero        = z I 9r oU              ;
oh          = oU                     ;
one         = w ^ n [&]              ;
two         = tc th u                ;
three       = T 9r i:                ;
four        = f oU 9r                ;
five        = f aI v                 ;
six         = s I kc kh s            ;
seven       = s E v I n [&]          ;
eight       = ei tc [th]             ;
nine        = n aI n [&]             ;

separator   = .pau [.garbage] .pau   ;

digit.phnrules:
$sil := .pau .garbage tc kc /BOU /EOU ;
$den_l := s z th ;
$den_r := s z th ;

// .pau -> <.pau> ;
.pau = <.pau> ;

// s -> $LC<s s>$RC :: $LC __ $RC ;

s -> v<s s>E :: v __ E ;

s -> n<s s>E :: n __ E ;

s -> &<s s>E :: & __ E ;

s -> 9r<s s>E :: 9r __ E ;
grammarExample1.tcl
package require Statenet

set stateNet [statenet create digit.grammar "grammar"]
statenet add $stateNet digit.lexicon "word"
statenet add $stateNet digit.phnrules "phoneme" -selfLoops 1

statenet print $stateNet

nuke $stateNet



The next example illustrates a context-dependent rule that expands t into tc (t-closure) followed by an optional th (t-burst), or the flapped phoneme d_\( (note the use of backslash to prevent parsing the parenthesis in the phoneme name), when the t occurs in the context of b and ^ on the left and 3r on the right.  This causes the pronunciation of word1 to be /b ^ (tc [th]) | d_\( 3r/ or /b I t 3r/ or /b E t 3r/, and the pronunciation of word2 is /m ^ t 3r/, where only the t in the specified context has been expanded.

cdexample1.txt:
butter = b (^ | I | E) t 3r;
mutter = m ^ t 3r;

$grammar = butter | mutter;
cdexample2.txt:
t        -> (tc [th]) | d_\(    :: b ^ __ 3r ;
grammarExample2.tcl:
package require Statenet

set stateNet [statenet create cdexample1.txt "grammar"]
statenet add $stateNet cdexample2.txt "word"
statenet print $stateNet

nuke $stateNet



See Also
The Statenet package

Author
John-Paul Hosom, hosom@{cslu, bme, cse}.ogi.edu