For more than thirty years, researchers have been working on handwriting recognition. As in the case of speech processing, they aimed at designing systems able to understand personal encoding of natural language.
Over the last few years, the number of academic laboratories and companies involved in research on handwriting recognition has continually increased. Simultaneously, commercial products have become available. This new stage in the evolution of handwriting processing results from a combination of several elements: improvements in recognition rates, the use of complex systems integrating several kinds of information, the choice of relevant application domains, and new technologies such as high quality high speed scanners and inexpensive powerful CPUs. A selection of recent publications on this topic is: [Imp94,IWF93,Pla93,PM92,IS92,Wan91].
Methods and recognition rates depend on the level of constraints on handwriting. The constraints are mainly characterized by the types of handwriting, the number of scriptors, the size of the vocabulary and the spatial layout. Obviously, recognition becomes more difficult when the constraints decrease. Considering the types of roman script (roughly classified as hand printed, discrete script and cursive script), the difficulty is lower for handwriting produced as a sequence of separate characters than for cursive script which has much in common with continuous speech recognition. For other writing systems, character recognition is hard to achieve, as in the case of Kanji which is characterized by complex shapes and a huge number of symbols.
The characteristics which constrain hand writting may be combined in order to define handwriting categories for which the results of automatic processing are satisfactory. The trade-off between constraints and error rates give rise to applications in several domains. The resulting commercial products have proved that handwriting processing can be integrated into working environments. Most efforts have been devoted to mail sorting, bank check reading, forms processing in administration and insurance. These applications are of great economic interest, each of them concerning millions of documents.
Mail sorting is a good illustration of the evolution in the domain. In this case, the number of writers is unconstrained. In the early stages, only ZIP code was recognized. Then cities (and states as in the U.S.) were processed, implying the recognition of several types of handwriting: hand printed, cursive, or a mixture of both. The use of the redundancy between the ZIP code and the city name, as well as redundancy between numeral and literal amounts in bank checks, shows that combining several sources of information improves the recognition rates. Today, the goal is to read the full address, down to the level of the information used by the individual carrier. This necessitates precisely extracting the writing lines, manipulating a very large vocabulary and using contextual knowledge as the syntax of addresses (such as in the case of reading the literal amount of checks, the use of syntactic rules improves the recognition).
These new challenges bring the ongoing studies closer to unconstrained handwritten language processing which is the ultimate aim. The reading of all of the handwritten and printed information present on a document is necessary to process it automatically, to use content dependent criteria to store, access and transmit it and to check its content. Automatic handwritten language processing will also allow one to convert and to handle manuscripts produced over several centuries within a computer environment.
Recognition strategies heavily depends on the nature of the data to be recognized. In the cursive case, the problem is made complex by the fact that the writing is fundamentally ambiguous as the letters in the word are generally linked together, poorly written and may even be missing. On the contrary, hand printed word recognition is more related to printed word recognition, the individual letters composing the word being usually much easier to isolate and to identify. As a consequence of this, methods working on a letter basis (i.e., based on character segmentation and recognition) are well suited to hand printed word recognition while cursive scripts require more specific and/or sophisticated techniques. Inherent ambiguity must then be compensated by the use of contextual information.
Intense activity was devoted to the character recognition problem during the seventies and the eighties and pretty good results have been achieved [MSY92]. Current research is rather focusing on large character sets like Kanji and on the recognition of handwritten roman words. The recognition of handwritten characters being much related to printed character recognition, we will mainly focus on cursive word recognition.
Character Recognition techniques can be classified according to two criteria: the way preprocessing is performed on the data and the type of the decision algorithm.
Preprocessing techniques include three main categories: the use of global transforms (correlation, Fourier descriptors, etc.), local comparison (local densities, intersections with straight lines, variable masks, characteristic loci, etc.) and geometrical or topological characteristics (strokes, loops, openings, diacritical marks, skeleton, etc.).
Depending on the type of preprocessing stage, various kinds of decision methods have been used such as: various statistical methods, neural networks, structural matchings (on trees, chains, etc.) and stochastic processing (Markov chains, etc.). Many recent methods mix several techniques together in order to provide a better reliability to compensate the great variability of handwriting.
As pointed out in the chapter overview, two main types of strategies have
been applied to this problem since the beginning of research in this
field: the holistic approach and the analytical approach
[LB94,LL93,HHF
92,SBG94].
In the first case recognition is globally performed on the whole
representation of words and there is no attempt to identify characters
individually.
The main advantage of holistic methods is that they avoid word segmentation [RP93]. Their main drawback is that they are related to a fixed lexicon of word descriptions: as these methods do not rely on letters, words are directly described by means of features and adding new words to the lexicon require human training or the automatic generation of word descriptions from ASCII words. These methods are generally based on dynamic programming (DP) (edit distance, DP-matching, etc.) or model-discriminant hidden Markov models.
Analytical strategies deal with several levels of representation corresponding to increasing levels of abstraction (usually the feature level, the grapheme or pseudo-letter level and the word level). Words are not considered as a whole, but as sequences of smaller size units which must be easily related to characters in order to make recognition independent from a specific vocabulary.
These methods are themselves subclassed into two categories:
analytical methods with explicit (or external) segmentation where
grapheme or pseudo-letter segmentation takes place before recognition
[LC91]
and analytical methods with implicit (or internal) segmentation
[BMLC
92,CKZS92]
which perform segmentation and recognition simultaneously
(segmentation is then a by-product of recognition). In both cases,
lexical knowledge is heavily used to help recognition. This
lexical knowledge can either be described by means of a lexicon of
ASCII words (which is often represented by means of a lexical tree) or
by statistical information on letter co-occurrence (n-grams,
transitional probabilities, etc.). The advantage of
letter-based recognition methods is that the vocabulary can
be dynamically defined and modified without the need of word training.
Many techniques initially designed for character recognition
(like neural networks
[BMLC
92])
have been incorporated to analytical methods for recognizing tentative
letters or graphemes. The contextual phase is generally based on
dynamic programming and/or Markov chains (edit distance,
Viterbi algorithm, etc.). Fruitful research has been
realized in recent years in the field of analytic recognition with
implicit segmentation using various kinds of hidden Markov
models
[CKZS92].
Exploitable results can already be obtained when the data is sufficiently constrained. Commercial products are already available for hand printed characters recognition in forms and recent research projects have shown that cursive word recognition is feasible for small lexicons and/or when strong sentence syntax is provided. For instance recognition rates of 95% (respectively 90%) or more have been obtained for lexicons of American city names whose size varies between 10 and 100 (respectively 1000) words [KSC93].
Recent studies show the emergence of two promising tendencies: