![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
National Cellular v2.3Overview The Cellular Corpus consists of cellular telephone speech from 2336 callers from locations throughout the United States. The data collection protocol contains requests for fixed vocabulary and continuous speech utterances. A total of about one minute of speech from each caller is collected. Recording Conditions The data were collected with the CSLU T1 digital data collection system. The sampling rate was 8khz and the files were stored in 8-bit mu-law format on a UNIX file system. File Name Conventions A call is composed of the series of files recorded during each recording session. Every call is identified by a unique call number, and each file in the call is further identifed by an utterance type. The filename identifies the call number and the question type. NC000041.WAV The first two capitalized letters, "NC", indicate the corpus, National Cellular. The next 5 digits are the call number. The last digit indicates the utterance type. The utterance types are shown in this table:
The word "WAV" indicates that this is a speechfile. Speech File Formats The speech file in this distribution are stored as RIFF wav files. 8kHz sampling and 16-bit linear coding. Distribution directory structure At the top level of the distribution there are two directories: speech, trans. Immediately below the top level of each directory there are several number subdirectories (0, 1, 2, etc.). These numbers directories hold the files, split by call number div 10. That is, in subdirectory 0 will be the files for calls 0-9, subdirectory 1 will hold the files for calls 10-19, and so on. Transcription Each utterance in the National Cellular corpus has an orthographic transcription. The transcriptions are in the trans directory. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||