free speech journal
Volume 1 A Web Journal dedicated to the state of the art in human language technology 1997

Masthead

FSJ Table of Contents

What's New

Past Volumes

Search

Editors Index

Reviewers Index

FAQ

Submission

Toolkit & Style Guide

Editors & Reviewers Private Area

Speech Related Links


Newest publication (10/22/97)
(download 1.9MB postscript file)

Phoneme Probability Estimation with Dynamic Sparsely Connected Artificial Neural Networks

Nikko Ström (nikko@speech.kth.se)

Department of Speech, Music and Hearing, KTH, Stockholm, Sweden Centre for Speech Technology, KTH, Stockholm, Sweden

Abstract

This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully connected recurrent network grows super-linear with the number of hidden units, schemes for sparse connection and connection pruning are explored. It is found that sparsely connected networks outperform their fully connected counterparts with an equal number of connections. The implementation of the combined architecture and training scheme is described in detail. The networks are evaluated in a hybrid HMM/ANN system for phoneme recognition on the TIMIT database, and for word recognition on the WAXHOLM database. The achieved phone error-rate, 27.8%, for the standard 39 phoneme set on the core test-set of the TIMIT database is in the range of the lowest reported. All training and simulation software used is made freely available by the author, and detailed information about the software and the training process is given in an Appendix.


We welcome any comments, suggesions, and feedback you might have: fsj_questions@cse.ogi.edu
FSJ is supported in part by The Center for Spoken Language Understanding
Last Updated by SDC, April 1, 1996