NAME

alphabet - alphabet and alpha-digit recognizer


SYNOPSIS

 package require Alphabet
 alphabet initialize recogvar {directory NULL} {type ALPHA} {pau NULL}
    {btheap NULL} {infovar NULL}
 alphabet pipe recogvar w
 alphabet result recogvar {nbest 3}
 alphabet nuke recogvar
 alphabet reset recogvar

PARAMETERS

recogvar
associate array containing all the objects needed for recognition
directory
full path to the directory file, used for name lookup
type
recognizer needed for application (ALPHA/ALPHADIGIT/DIGIT), (default=ALPHA)
pau
use recognizer with optional pauses between alphadigits.
btheap
optional shared backtrace heap handle
infovar
optional parameter adjustment
nbest
an integer count of the number of names to attempt to return from the directory.
w
a wave object sent through the recognizer pipeline

DESCRIPTION

alphabet implements the CSLU alphabet and alpha-digit recognition engine. Recognition is a three stage process. First a "conventional" frame-based recognizer with and alpha, digit or alpha digit vocabulary is run on the speech. The letters and digits found are then reclassified with a whole-word classifier (a larger neural network) which produces an output value for each of the 26 letters and 10 digits, plus one for NL (not a letter) which is trained on miscellaneous sounds in the alpha/digit corpus. Finally, these letter and digit scores are used to find the top scoring names in a directory, which has a tree structure in which entries share common prefixes. Because of the tree structure, it is possible to efficiently search hundreds of thousands of names (given sufficient memory).

The first pass takes the most time because there are many more frames than letters. It can be pipelined like the other CSLU frame-based recognizers using the alphabet pipe function (which can also be called once for the entire utterance, of course).

The initialize function call creates an instance of the alphabet/alpha-digit recognizer. Multiple recognition engines may thus be created through successive function calls to alphabet initialize. The type of the recognizer is decided by the optional parameters type and pau.

The type parameter indicates whether the grammar contains either letters of the alphabet only (type set to ALPHA), letters of the alphabet and digits (type set to ALPHADIGIT), or digits only (type set to DIGIT). The same recognition engine is used in all three cases, except the first pass grammar is constrained according to the specification and that will lead to better segmentation and better performance. Also, the scores matrix returned (see below) is limited according to the specification.

The pau parameter which defaults to NULL (i.e., no pause) indicates whether the grammar expects fluent spoken letters or digits or whether the grammar expects forced pauses between letters or digits. If you know the users will pause, then setting pau (making it anything but {}) will help performance.

The infovar variable allows access to a couple of tunable parameters. The defaults for these are:

info(searhsize) (100000)
The initial size of the search list. Since the 1.8 release of the toolkit, this is not important because the search space grows as needed.
info(prunethresh) (0.0001)
During the directory search, partial name paths whose score is less than the prunethresh times the max score are deleted. Smaller numbers make the search slower but more accurate.
info(deletepen) (0.05)
The penalty for a deleted letter, meaning the letter was in the name but not recognized. So for the recognized letter string "JONS" to return the name "Jones", the deleted "e" must be accounted for.
info(langpower) (0.0)
If the directory contains prior probabilities for the names, those probabilities are taken to this power and the name score is multiplied by the result. A value of 0.0 turns all name probabilities into 1.0, so they have no effect.

The alphabet result function call will do the second and third stages, calling the alpha/digit neural network for each of the letters or digits found in the first pass and then updating the directory search tree (if any) with the scores. If there was a directory, the top N scoring names are returned along with their confidence. Here is the structure of the returned list:

 0: {{name1 conf1} {name2 conf2} .. {nameN confN}}
 1: {{raw-let-1 letconf1} {raw-let-2 letconf2} .. {raw-let-M letconfM}}
 2: letter segmentation
 3: phoneme segmentation

Confidence

The confidence for a letter is the relative likelihood of a true instance of the letter getting that score (the output of the nnet) or worse compared with the likelihood of some extraneous speech getting that score or better, as estimated on OGI speech corpora. A score of .5 means that score is equally likely to be in vocabulary as out (assuming the prior probabilities are equal).

The confidence for names is computed from the corresponding letter confidences by taking the geometric means. Based on some early experience, using a threshold of .3 for rejection seems to strike a good balance, but this will depend on many factors.

The raw scores are no longer in the return list, but are available in an arrayF structure:

 set res [alphabet result abc]
 set scores $abc(scores)
 puts [lindex [lindex [mx puts $scores.(0:0,2:2)] 0] 0]

The above code prints the score for zero in the third position for ALPHADIGIT or DIGIT recognition, and prints the score for "a" in ALPHA recognition.

The alphabet nuke function destroys(nukes) all associated memory of a recognizer and search engine.

Directories

Directories are simply sets of alpha/digit strings. The directory has to be in a certain format before it can be used. The command line program "precompile" in the CSLUDIR bin is used to turn a list of alpha/digit strings into a usable (tree-structured) format:

 precompile list > list.cn

EXAMPLE

 package require Alphabet 1.0
 alphabet initialize recog names_50k.cn
 set w [wave read firstname.wav]
 alphabet pipe recog $w
 set res [alphabet result recog 2]
 set names [lindex $res 0]
 set topname [lindex $names 0]
 set secondname [lindex $names 1]
 puts "the top name was [lindex $topname 0]"
 puts "the second was [lindex $secondname 0]"
 alphabet nuke recog

RETURNS

alphabet result returns the spelled name retrieved if the directory option was set, the word level recognition alignment, and the phoneme level alignment.


AUTHOR

Johan Schalkwyk
Mark Fanty
Center for Spoken Language Understanding
Oregon Graduate Institute of Science & Technology


Last modified on Wed Mar 26 17:47:25 PST 1997.