CSLU Toolkit Formats


State Specification Format
Version: 1.0
Created:
25 May 2003
Modified:
25 May 2003

Overview
The state specification format defines types of context-dependent states used by the CSLU Toolkit.   


Description

Multi-State Biphone

The multi-state biphone format is currently the only format supported by the CSLU Toolkit.  This format allows for both context-independent units as well as context-dependent units.  The context dependent units are dependent on only the left or right context, unlike triphones which are dependent on both the left and right context.  Instead of using a fixed number of states (e.g. 3) to represent each phoneme, the number of states depends on the context dependencies.

For context-independent phonemes, a single state is used.
For phonemes that are dependent on only the left context, a single state is used.
For phonemes that are dependent on only the right context, a single state is used.
For phonemes that depend on both the left and right context, either two or three states may be used.  If three states are used, then the middle state is context-independent.

The left context of a phoneme is indicated by characters to the left of an "<" in the state name.
The right context of a phoneme is indicated by the characters to the right of an ">" in the state name.

The left or right contexts of a phoneme may be individual phonemes, or they may be clusters of phonemes.  The use of phonetic clusters in the contexts allows for a large reduction in the number of states required for many applications.  Clusters are defined in the recognizer spec format, and are typically indicated as clusters by the use of a dollar sign ($) at the beginning of the cluster name.


Clusters define phonemes that have a similar coarticulatory effect on the center phoneme.  Some phonemes, such as /ay/ or /oy/, have different coarticulatory effects depending on whether they're on the left or right of the center phoneme.  For example, /ay/ has the context of a front vowel when it occurs to the left of the center phoneme, and it has the context of a back vowel when it occurs to the right of the center phoneme.  Therefore, clusters may need to be separated into two types; those occurring on the left of the center phoneme and those occurring on the right, and these two types will contain different sets of phonemes.  It is common practice to append _l to a context name that  occurs to the left of the center phoneme, and append _r to a context name that occurs to the right of the center phoneme.  So, for example, there might be two clusters, one for front-vowel context and one for back-vowel context, and they might be defined as follows (using TIMIT notation):
    $fnt_l := iy ih eh ae ei ay oy ;
    $fnt_r := iy ih eh ae ei ;
    $bck_l := uw uh ah ao aa ow aw ;
    $bck_r := uw uh ah ao aa ow aw ay oy ;


Examples

The phoneme /ih/ is a short phoneme, and so it may be represented using only two context-dependent states, e.g. $sil<ih, $fnt_l<ih, $bck_l<ih, ih>$sil, ih>$fnt_r, ih>$bck_r, etc.
left context-dependent states
$sil<ih   $fnt_l<ih   $bck_l<ih  etc.
right context-dependent states
ih>$sil   ih>$fnt_r   $ih>bck_r   etc.


The phoneme /iy/ is a long phoneme that usually has a steady-state region that is not heavily influenced by surrounding phonemes.  Therefore, /iy/ may be represented using three states; the left and right state are context dependent, and the middle state is context independent:
left context-dependent states
$sil<iy   $fnt_l<iy   $bck_l<iy  etc.
middle context-independent state
<iy>
right context-dependent states
iy>$sil   iy>$fnt_r   $iy>bck_r   etc.

The phoneme /n/ is hardly influenced by the phonemes that surround it (although it has a large influence on those surrounding phonemes).  We can therefore represent the /n/ phoneme using a single context-independent state:
context independent state
<n>

The phoneme /p/ burst is always preceded by a /p/ closure (pcl), so it is not influenced by its left context.  In addition, plosive bursts tend to be short, and we don't want to model this phoneme using too many states.  Therefore, we can model /p/ as a single-state, right-context-dependent state:
right context-dependent state
p>$sil   p>$fnt_r   p>$bck_r   etc.

See Also
The Statenet package

Author
John-Paul Hosom, hosom@{cslu, bme, cse}.ogi.edu