- ...
- The current version of this Survey of the
State of the Art in Human Language Technology will change in late
Spring/early Summer, 1996, to
reflect the copyediting by the publisher.
- ...phoneme
- Linguistic symbols presented
between slashes, e.g., /p/, /t/, /k/, refer to phonemes; the minimal
sound unit by changing it one changes the meaning of a word. The acoustic
realizations of phonemes in speech are referred to as allophones,
phones, or phonetic segments, and are presented in brackets,
e.g., [p], [t], [k].
- ...,
- Here and in the following, the
notation 46#46 stands for the sequence 47#47.
- ...spelling.
- For
example, we treat as the same word the present and past participle of
the verb read (I read vs. I have read) in the
LM while the acoustic model will have
different models corresponding to the different pronunciations.
- ...way
- Instead of having a single partition of the space of
histories, one can use the exponential family to define a set of
features that are used for computing the probability of an event. See
the discussion on Maximum Entropy in
[LRR93,DR72,BDPDP94] for
more details.
- ...tokenizer
- Tokenizing
English is fairly straightforward since white space separates words
and simple rules can capture many of the punctuations. Special care
has to be taken for abbreviations. For oriental languages such as
Japanese and Chinese word segmentation is a more complicated problem
since space is not used between words.
- ...Understanding
- I am grateful to Victor Zue for many very helpful suggestions.
- Stephen Pulman....
-
This survey draws in part on material prepared for the European
Commission LRE \ Project 62-051, FraCaS: A Framework for
Computational Semantics. I am \ grateful to the other members of
the project for their comments and contribution\ s.
Maintained by
Mike Noel and
Wei Wei