![]() |
![]() |
|||||||||||||||||||||||||||||
Stories v1.2General Description The Stories Corpus is made up of extemporaneous speech collected from English speakers in the CSLU Multi-language Telephone Speech data collection. Each speaker was asked to speak on a topic of their choice for one minute. These utterances make up the Stories Corpus. Recording Details The data were recorded from an analog line using a Gradient Technologies analog-to-digital conversion box. The file format used is 8 khz 16-bit linear with a 1024-byte NIST Sphere header. File Naming Convention File naming follows the following convention:
ENcall-1003-G.story-bt.txt
The first field ("ENcall") is the prefix indicating the corpus to which this data belongs, and the second field ("100") represents a unique ID number for the speaker. The remainder of the information is irrelevant. These audio and text files are subdivided into directories based on their call number divided by 10. So, the files for call 103 could be found in the /10 subdirectory. The /trans and /labels directory file structures exactly parallel the structure of the /speech directory. File Formats The data were recorded from an analog line using a Gradient Technologies analog-to-digital conversion box. The .wav file format used is the RIFF standard file format. This file format is 16-bit linearly encoded. Transcriptions The text transcriptions were performed according to the non time-aligned word-level conventions described in the CSLU Labeling Guide. Phonetic transcriptions are plain text files that carry time-aligned phonetic labels. The first two lines of the file are a header which defines the length of a "frame" in milliseconds. The rest of the files consists of two numbers that define a frame range, and a label that applies to that region. For example:
MillisecondsPerFrame: 1.000000
So, we can see here that a frame corresponds to 1 millisecond (ms) of time, and that from 2 to 113 ms into the file, there is a pause (.pau), with the first phoneme (w) starting at 113 ms and stretching to 191 ms. The word-level transcription files follow the same format, with word labels in place of the phonetic labels. The .com files that are found with the .wrd files contain information about breathing during the speech. They are in a similar time-aligned format. Labels The lola files are ASCII "location and label" files. They are similar to the ".phn" files of the TIMIT database except:
Each file in this distribution has the header:
MillisecondsPerFrame: 3.0
After that are a series of lines, one per segment, of the form:
[begin frame][end frame + 1] label
For example
200 237 ah
The [ah] segment extends from from 200 to frame 236 inclusive. The end label is 237 for historical reasons. |
||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||