CSLU Toolkit Formats


recognizer Spec Format
Version: 1.0
Created:
25 May 2003
Modified:
25 May 2003

Overview
The recognizer Spec format contains specifications of a particular recognizer, creating a link between a general state network for word recognition and an HMM specific to a recognizer.    


Synopsis
FILE-BASED GRAMMAR
Tcl-LIST BASED GRAMMAR
// comment
--- comments are not allowed ---
duration_model durationModelName ;
duration_model durationModelName
sampling_freq samplingFrequency ;
sampling_freq samplingFrequency
frame_size frameSize ;
frame_size frameSize
features <URI> ;
features <URI>
feat_context <URI> ;
feat_context <URI>
clusterName := token1  token2  ...  tokenN ;
clusterName := token1  token2  ...  tokenN
map token := token1  token2  ...  tokenN ;
map token := token1  token2  ...  tokenN
category index dParam1 dParam2 tiedCategory ;
category index dParam1 dParam2 tiedCategory
--- blank lines ARE allowed --- --- blank lines are NOT allowed ---

Variables
Variable
Description
comment
Any text.  A comment is terminated by the end of the line.  Comments are not allowed when the grammar is specified using a Tcl list.
durationModelName
The name of the duration model to be used in the Viterbi search.  Valid names are "exponential", "gamma", and "minmax", although currently only "minmax" is implemented.
samplingFrequency
The sampling frequency, in Hz, to be used in computing feature values from the waveform.  There is no default value.
frameSize
The frame size, in msec, at which to compute features and perform recognition.  There is no default value.
URI
The location and filename of Tcl code that may be used to compute features or a context window of features.  If the features or feat_context keyword is not specified, then default features are computed or a default context window is used.  The classifier must have been trained using these default features and/or context window; if there is a mismatch, recognition performance will be extremely poor.
token
A token that is one level above the category level.  Typically, this will be at the level of phonemes, and each token will represent one phoneme.  
clusterName
A name that specifies a cluster of tokens.  This information may be used by the statenet addSpec command to expand phonemes into context-dependent categories.
category
The name of a category output by the classifier.
index
The index of the classifier output that is associated with the category name.
dParam1
The first duration-model parameter.  In the minmax duration model, this is the shortest duration (in msec) of the category before a penalty is applied.
dParam2
The second duration-model parameter.  In the minmax duration model, this is the longest duration (in msec) of the category before a penalty is applied.
tiedCategory
If this category is not an output of the classifer, tiedCategory is the index of the category that is to be used instead.  If category is an output of the classifier, then tiedCategory is the symbol "-".  If category is not an output (and tiedCategory is an index value), then typically the value of index for this category will be -1.

Keywords and Reserved Symbols
Keyword or Symbol
Description
//
When these characters occur in sequence in a grammar file, they define the beginning of a comment.  The comment ends when the end of the line is reached.  Anything within the comment is ignored by the Statenet package.  If the grammar is specified using Tcl lists, comments are not allowed.
duration_model
This specifies that the next word will be the type of duration model.
sampling_freq
This specifies that the next item will be the sampling frequency of the recognizer, in Hz.
frame_size
This specifies that the next item will be the frame size of the recognizer, in msec.
features
This specifies that the next item will be a URI pointing to Tcl code for computing features.
feat_context
This specifies that the next item will be a URI pointing to Tcl code for computing a context window.
map
This specifies that the next word will be the token that the tokens after := are mapped onto.
:=
This is the delimiter between a cluster name and tokens belonging to a cluster, as well as the delimiter between a token that will be the result of mapping and the tokens that will be mapped.


Description
The recognizer Spec format contains specifications of a particular recognizer, creating a link between a general state network for word recognition and an HMM specific to a recognizer.  A file containing a spec format may be created from an existing ".olddesc" file using the tcl script "olddesc2spec.tcl".  Rather than specifying a large number of context-dependent rules to map phonemes to context-dependent states, the statenet addSpec command may be used to perform this mapping.  This statenet addSpec command creates context-dependent states from the Spec format, and is faster than performing the mapping using the statenet add command with context-dependent rules.  However, the mapping is restricted to known rules for generating state specifications.  The statenet specUpdate command is used to read the recognizer specifications in the Spec format, such as duration-model parameters, category index values, frame size and sampling rates, into the statenet object.


Example
The following example specifies the categories used by a digits recognizer.  Only a fraction of the entire file is shown here.
 
digit.force.spec:
// .spec file
// created Mon May 12 13:20:11 Pacific Daylight Time 2003
duration_model    minmax ;
sampling_freq     8000 ;
frame_size        10 ;
features          <features.tcl#compute_feat> ;
feat_context      <features.tcl#compose_vec> ;

// context clusters
$sil      := .pau /BOU /EOU .garbage ;
$den_r    := s th ;
$den_l    := s ks th ;

// mappings
map uc    := tc kc ;

// name            index    dur_p1    dur_p2     tie
<.pau>                 0       10     5460        -    ;
I<9r                   1       20      100        -    ;
T<9r                   2       10      120        -    ;
9r>i:                  3       10      150        -    ;
9r>oU                  4       20      130        -    ;
$den_l<E               5       30      160        -    ;
E>v                    6       30      170        -    ;
f<\>r                  7       40      190        -    ;
...
th>$sil              161       20      180        -    ;
...
u<z                  214       20       90        -    ;
uc<z                 215       10      120        -    ;
v<z                  216       20      130        -    ;
z>I                  217       10      160        -    ;
<.garbage>           218       10     5000        -    ;
th>uc                 -1       10     5000        th>$sil    ;
 

See Also
The Statenet package

Author
John-Paul Hosom, hosom@{cslu, bme, cse}.ogi.edu