next up previous contents index
Next: 13.10 Character Recognition Up: 13 Evaluation Previous: 13.8 Usability and Interface Design

13.9 Speech Communication Quality

Herman J. M. Steeneken
TNO Human Factors Research Institute, Soesterberg, The Netherlands

Speech is considered to be the major means of communication between people. In many situations, however, the speech signal we are listening to is degraded, and only a limited transfer of information is obtained. The purpose of assessment is to quantify these limitations and to identify the limitations responsible for the loss in intelligibility. For assessment of speech communication systems mainly three major evaluation methods are used:

  1. Subjective intelligibility based on scores for correct recall of sentences, words or phonemes;
  2. quality ratings based on a subjective impression; and
  3. objective measures based on physical properties of the speech transmission system.

A comprehensive overview is given by [Ste92].

13.9.1 Subjective Intelligibility Tests

These are based on various types of speech material evaluated in speaker-listener communication. All these tests have their specific advantages and limitations, mostly related to the speech elements tested. Speech elements frequently used for testing are phonemes, words (digits, alphabet, meaningful words, or nonsense CVC-words consonant-vowel-consonant (CVC) (Consonant-Vowel-Consonant), sentences, and sometimes a free conversation. The percentage correctly recalled items of the set presented gives the score. The recall procedure can be based on a given limited set of responses or on an open response design in which all possible alternatives are allowed as a response. A limited response set is used with the so-called rhyme tests. These type of tests are easy to administer and do not require extensive training by the listeners in order to arrive at stable scores. Rhyme tests may, depending on the design, disregard specific phoneme confusions [HWHK65]. Open response tests, especially those which make use of nonsense words, require an extensive training of the listeners. However, additionally to the word and phoneme scores, possible confusions between phonemes are obtained. This allows for diagnostic analysis. Redundant speech material (sentences, rhyme tests) suffers from ceiling effects (100% score at poor-to-fair conditions) while tests based on nonsense words may discriminate between good and excellent conditions.

13.9.2 Quality Rating or Mean Opinion Scoring (MOS)

Mean Opinion Scoring (MOS)

As noted in section 10.2, MOS is a more global method used to evaluate the user's acceptance of a transmission channel or speech output system. It reflects the total auditory impression of speech by a listener. For quality ratings, normal test sentences or a free conversation are used to obtain the listener's impression. The listener is asked to rate his impression on subjective scales such as: intelligibility, quality, acceptability, naturalness, etc. The MOS gives a wide variation among listener scores and does not give an absolute measure since the scales used by the listeners are not calibrated.

13.9.3 Objective Measures

Objective measures based on physical aspects quantify the effect on the speech signal and the related loss of intelligibility due to deteriorations as: a limited frequency transfer, masking noises with various spectra, reverberation and echoes, and a nonlinear transfer resulting from peak clipping, quantization, or interruptions. Frequently used methods are the Articulation Index (AI) [Kry62] and the Speech Transmission Index (STI), [SH80]. The STI makes use of artificial test signals which are passed through the system under test and analyzed at the output-side. Such a measurement can be performed typically in 15 seconds [SVH93], while subjective measurements require at least one hour.


Figure: Relation between and qualification of some subjective intelligibility measures and the objective STI.

In Figure gif the relation between some intelligibility measures and the STI is given. These results are based on cumulated results obtained over the years. A subjective qualification, based on an international comparison [HS84], is also given. The graph also demonstrates the ceiling effect of intelligibility tests making use of redundant speech material .



next up previous contents
Next: 13.10 Character Recognition Up: 13 Evaluation Previous: 13.8 Usability and Interface Design