Smoothing of HMM parameters for efficient recognition of online handwriting
Introduction
With the availability of pen computers at increasingly affordable prices, online handwriting recognition studies have received much attention. In the literature, there are several reports [1], [2], [3], [4], [5], [6], [7] on such online handwriting recognition studies of different scripts such as Latin, Japanese, Korean, Chinese and Arabic. Although the majority of existing online handwriting recognition studies [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21] on the Indic scripts considered only isolated characters, recently online handwritten word recognition studies [22], [23], [24], [25], [26], [27], [28] of a few Indic scripts have gained some momentum. The earliest work on Bangla (a major Indic script) online handwriting recognition [20] was based on its isolated basic characters. Later, Bhattacharya et al. [14] studied a direction code based feature vector for recognition of online handwritten Bangla basic characters. In 2008, Parui et al. [12] first identified the set of strokes used to write the basic characters of Bangla. Hidden Markov models were used to identify these strokes and a look-up table was used to recognize the characters in the second stage. In another study [11], strokes used to write isolated Bangla characters were recognized using a DTW-based technique and the characters were identified by a rule based method. In [10], a sub-stroke based recognition study of online handwritten Bangla basic characters was presented. In this study, recognition performance of an HMM-based classifier was compared with that of a DTW-based nearest-neighbor classifier. The first benchmark recognition results on databases of online handwritten isolated characters of major Indic scripts including Bangla were reported in [9]. In a recent study [8] of online handwritten isolated Bangla characters, a database of stroke level samples was semi-automatically developed from the isolated character database presented in [9] and a hidden Markov model based character classifier was studied where the stroke classes formed the state space.
Unconstrained handwriting in Bangla is mixed cursive in nature unlike other Indic scripts. This style of handwriting is a general combination of cursively written connected word fragments and discretely written isolated characters. This is the most difficult style of handwriting from the view point of its automatic recognition although it is the most realistic style required to be studied. On the other hand, Bangla has other common peculiarities of an Indic script compared to several other scripts as discussed in [23], [29]. This script is used by around 230 million people in the Indian subcontinent. It has a set of 50 basic character shapes 11 of which are vowels and the remaining 39 are consonants. Modified shapes of 10 characters among the basic vowels and another two among the basic consonants occur frequently. Shapes of these 62 characters (50 basic characters, 10 vowel modifiers and 2 consonant modifiers) are shown in Fig. 1. These modifiers are in fact diacritics and these are the modified shapes of a few basic characters when they appear in proximity of other characters such as the vowel modifier ‘AAKAR’ is the modified shape of the basic vowel ‘AA’ or ‘IKAR’ is the modified shape of the basic vowel ‘I’ as shown in Fig. 1. Two or more from the above 62 shapes often combine forming a large number of compound characters with complicated shapes. Although there are around 250 different compound characters, their occurrence percentage in a typical corpus is very low [30].
Only a few recognition studies [24], [25], [26], [27] of unconstrained Bangla online handwritten words are available in the literature. In the earliest such study [27], a segmentation-based strategy was used and the sub-strokes of the input word were recognized using a modified quadratic discriminant function (MQDF) classifier. A verification module used a set of rules to recognize the word from MQDF outputs. Fink et al. [26] studied a sub-stroke level feature representation and four hidden Markov model based writing models for recognition of these words. In this study, an input word sample was segmented wherever the pen along the trajectory turns more than a threshold value. Among different writing models studied in this work the largest model considered each pseudo-character in the actual context of its left and right neighbors as a distinct unit. On the other hand, a hybrid recognition strategy described in [25] used a multilayer perceptron (MLP) architecture for extraction of features from input word and a support vector machine (SVM) classifier trained on a fixed lexicon was used to recognize the word. Another study [24] identified 85 stroke classes used in cursive Bangla handwriting and recognized the strokes obtained from an input word using chain code histogram features and the SVM classifier. In the present attempt, we studied a non-homogeneous fully connected HMM-based classifier and a combination of linear and circular feature components to develop an efficient holistic recognition scheme for unconstrained cursive Bangla handwriting which often includes delayed strokes. The present approach has the advantage of incremental learning towards addition of new word classes into the lexicon. Additionally, we studied a novel two-level smoothing during estimation of HMM parameters to tackle the common problem of existence of several strokes occurring rarely. The lexicon used in the experimentation of the present study is so designed that it includes various difficult situations occurring in the real life scenario.
In the literature, both analytical and holistic approaches to machine reading of unconstrained handwriting have been studied. In an analytical approach, the input word sample is first segmented into characters or pseudo-characters which are recognized at a later stage to build the word-level interpretation. On the other hand, in the holistic approach, an input sample is considered as a single entity and it is recognized using certain global features of the word sample as a whole. The latter may successfully recognize poorly written word samples by avoiding segmentation ambiguities. Justifications of such holistic approaches to word recognition can be found in psychological studies of human reading, which shows that humans do not, in general, read words letter by letter. The importance of holistic approaches to handwriting recognition has been discussed in several reports [31], [32], [33]. In the present study of online unconstrained handwritten Bangla word recognition, the input sample is segmented into a variable number of sub-strokes each of which is either a character or part of a character. Unlike a typical holistic approach, here we do not consider a feature representation of the whole word sample as the basic element. On the other hand, unlike an analytic scheme our recognition strategy does not attempt to recognize the individual characters of the input word sample but recognizes the whole word sample as one entity.
The directional slope angle θ is a natural choice as a feature component of online handwriting samples. However, since θ is inherently circular in nature, commonly used statistical distributions cannot model them. As an alternative, the directional quantity θ is sometimes replaced [34] by two linear quantities and although these two are significantly dependent and also they increase the feature dimension. On the other hand, methodologies such as the von Mises distribution [35] are available to deal with directional or circular feature components. In the present study, we use both circular and linear feature components. We use the von Mises distribution to model circular feature components while the Gaussian distribution is used to model linear feature components. The von Mises distribution has practical benefits regarding parameter estimation as described in Section 3.4.10 of the book [35].
HMM based recognition algorithms for such unconstrained handwriting are considered to be more flexible and potentially more robust due to the enormous variability inherent in similar input data [36]. Moreover, vital contextual information such as morphological, lexical, and syntactical may be successfully exploited in the recognition process with the help of such a stochastic modeling. In a number of reports [2], [7], [23], [26], [31], [37], [38], the general understanding of the applicability of HMM in handwriting recognition tasks has been verified. Jung et al. [5] used a fully connected hidden Markov model with continuous observation probabilities to classify on-line Korean character space into clusters. Advantages of using such an HMM architecture include (i) it can achieve the highest likelihood due to its larger number of parameters, (ii) it can achieve the fastest convergence compared to the simpler models [39], (iii) it can exploit a long observation sequence by using experiences in longer past in comparison to simple left-to-right HMM. Although fully connected HMMs have been used before, we did not find any such use of non-homogeneous HMM in handwriting recognition tasks. Such a non-homogeneous HMM allows the transition probabilities to be different at different time instances. In the present task of unconstrained Bangla handwriting recognition, considering the enormous variation in the stroke order, we implemented a fully connected non-homogeneous HMM.
Determination of the number of states of an HMM is a crucial design issue. In [40], the same number of states is used irrespective of the class while in [41], intuitive knowledge was applied to decide the number of states in each underlying class. Successive state-splitting and state-merging according to maximum likelihood criterion were studied in [42] and [43] respectively. Lee et al. [44] studied a state-tying approach based on the structural similarity of patterns in a class. This approach was simulated for recognition of online Hangul handwriting. In modeling complex shape patterns of unconstrained handwriting of a script involving a large number of characters of widely varying cursive shapes such as those of Bangla script, an adequate number of states is usually high. However, finding a proper choice of the same is usually tricky and here we have taken a two-stage approach for this purpose. In the first stage, we pool together all the sub-strokes obtained from the whole set of training samples of the underlying classes, cluster them by K-means algorithm and use these clusters to determine the states of individual classes. A validation set of samples is used to obtain an optimal value of K. In the second stage, for each individual class we identify the clusters that are relevant to it. The number of states of a new (word) class consisting of characters appearing in the present lexicon may be obtained by computing only the latter stage of the above.
Finally, in this work, we have proposed a novel approach to estimation of initial and transition probabilities of an HMM. For a given observation, the state is determined so that it has the maximum posterior probability. In the classical approach, given an observation pair, the winner pair of states takes all where even the closely competing pairs of states do not get any contribution. This may cause overfitting of the training data affecting the generalization capability. In the proposed approach, we use a certain diffusion technique in estimating the transition and initial probabilities. Here, a pair of observations, instead of generating only one pair of winning states, contributes probabilities to all pairs of states such that a pair of states with higher posterior probabilities gets a higher contribution. Similarly, the initial state probabilities are also computed from the posterior probabilities, rather than determining the initial state of a training sample on the basis of the state giving the maximum posterior probability. Also, we diffuse a transition probability at a time instance over the preceding and the following time instances.
The rest of the paper is organized as follows. We describe our preprocessing operations in Section 2. Feature extraction approach is detailed in Section 3. Section 4 provides the detailed description of underlying HMMs, and Section 5 describes the estimation of the parameters of these HMMs. Section 7 provides the results of our simulations. Section 8 concludes the paper.
Section snippets
Preprocessing
As usual, the aim of our preprocessing operations is to reduce variations among samples of the same class. First of all, the repeated sample points (i.e., successive occurrences of the same point) are removed. This is followed by application of moving average operation for two times using window size 3. It helps to reduce noise. In the next step, we resample the points along each stroke separately so that the resulting sample points become equidistant. The size normalization of the word sample
Feature extraction
Generally, in a holistic recognition approach, the input word sample is treated as a single, indivisible entity for its feature representation [32]. However, the object recognition process among humans considers both the entire shape and the shapes of its various local parts. In [45], it has been observed that although geometrical features are important clues for human reading of cursive handwriting, special attentions are paid to specific parts of the word. In the present study, we first
Designing of HMM classifier
Several existing studies [48], [49], [50], [51] of online cursive handwriting recognition are based on HMMs. Such an HMM is a doubly embedded stochastic process with an underlying hidden process which can be observed through a second set of stochastic processes producing a sequence of observations [52]. A hidden process has several connected states. An HMM is completely specified by its number of states (K) and three probability distributions (π, A and B). The following notations will be used
Smoothing of HMM parameters
Often estimation of the initial and transition probabilities is adversely affected due to insufficient number of the training samples. In particular, some of these probabilities may be estimated as zero on the basis of the training samples, which may lead to over-fitting. To avoid this, that is, to make the estimated probabilities to have better generalization capabilities, certain smoothing operations are performed on the initial and transition probabilities. A trivial strategy is to smooth
Incremental learning
As described in Section 4, the generation of the proposed HMMs for the present classification task has two distinct steps. In the first step, the set of states is determined in an overall manner. In this step the underlying word classes are not dealt with individually but we pool together all the sub-strokes obtained from the entire set of training samples from all the word classes. This set of sub-strokes is clustered to identify the states of individual HMMs each representing a distinct word
Experimental results
For simulation purpose, we designed a lexicon of 110 Bangla words and developed a database of online handwritten samples on the basis of this lexicon. A few samples from this database are shown in Fig. 9. Lexe represents a general scenario in the sense that it includes various typical situations such as pairs of words with identical right tails like and or identical left tails such as and or identical at both the tails like and or one complete word appearing as the right tail of
Conclusions
In this work, we studied limited vocabulary recognition of unconstrained online handwritten Bangla words. A similar study of mixed cursive handwriting involving a large character set is rarely found. The lexicon set used in the present study has been carefully selected to include various possible difficulties. Difficult situations include lexicon set (i) consisting of words of lengths as small as 2 and also as large as 10, (ii) consisting of several groups of words with identical left or right
Conflict of interest
None declared.
Acknowledgments
This research work has been partially supported by the Technology Development for Indian Languages (TDIL) Programme of the Dept. of Information Technology, Govt. of India.
Oendrilla Samanta received her Bachelor degree with Hons. in Mathematics from Calcutta University in 2005 and Master of Computer Application from West Bengal University of Technology in 2008. She joined the Indian Statistical Institute, Kolkata in 2008 as a research scholar. She is working on online handwriting recognition of Bangla towards her Ph.D. thesis.
References (54)
- et al.
On-line arabic handwriting recognition with templates
Pattern Recognit.
(2009) - et al.
Semi-continuous HMMs with explicit state duration for unconstrained Arabic word modeling and recognition
Pattern Recognit. Lett.
(2008) - et al.
Continuous HMM applied to quantization of on-line Korean character spaces
Pattern Recognit. Lett.
(2000) - et al.
On-line recognition of handwritten Chinese characters based on hidden Markov models
Pattern Recognit.
(1997) - et al.
Handwritten Farsi (Arabic) word recognitiona holistic approach using discrete HMM
Pattern Recognit.
(2001) - et al.
Holistic cursive word recognition based on perceptual features
Pattern Recognit. Lett.
(2007) - et al.
Modeling and recognition of cursive words with hidden Markov models
Pattern Recognit.
(1995) - et al.
Evaluation of incremental learning algorithms for HMM in the recognition of alphanumeric characters
Pattern Recognit.
(2009) - et al.
Character segmentation in handwritten words—an overview
Pattern Recognit.
(1996) - et al.
Knowledge-based English cursive script segmentation
Pattern Recognit. Lett.
(2000)
An HMM-based character recognition network using level building
Pattern Recognit.
Writer independent on-line handwriting recognition using an HMM approach
Pattern Recognit.
Online recognition of Chinese charactersthe state-of-the-art
IEEE Trans. Pattern Anal. Mach. Intell.
The state of art in Japanese online handwriting recognition compared to techniques in western handwriting recognition
Int. J. Doc. Anal. Recognit.
On-line and off-line handwriting recognition: a comprehensive survey
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (41)
Age, gender and handedness prediction using handwritten text: A comprehensive survey
2024, Engineering Applications of Artificial IntelligenceOnline handwritten Gurmukhi word recognition using fine-tuned Deep Convolutional Neural Network on offline features
2021, Machine Learning with ApplicationsRNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning
2019, Pattern RecognitionCitation Excerpt :After relating the popularity distribution of Indian languages shown in this figure with the scripts used to write these languages, it can be established that Devanagari and Bengali are the most and second most popular Indian scripts respectively. Some research works are available on online handwritten word recognition in different Indian scripts as well such as Devanagari [15,16], Bengali [12–15], and Tamil scripts [16–18]. Online recognition of handwritten text in Indian scripts faces some unique challenges as compared to Latin script due to large number of symbols, symbol order variations—especially the variations in temporal ordering of vowel modifiers occur in different samples of the same word.
Personal digital bodyguards for e-security, e-learning and e-health: A prospective survey
2018, Pattern RecognitionAn HMM framework based on spherical-linear features for online cursive handwriting recognition
2018, Information SciencesCitation Excerpt :On the other hand, Schomaker and Segers [44] observed that although geometrical shape of a word is an important clue for human reading of cursive handwriting, special attentions are paid to its specific parts. In fact, explicit segmentation of the input word sample was used as a preprocessing step in some holistic handwriting recognition studies [27,41,42]. Segmentation-based and segmentation-free recognition approaches have been compared in [50].
Oendrilla Samanta received her Bachelor degree with Hons. in Mathematics from Calcutta University in 2005 and Master of Computer Application from West Bengal University of Technology in 2008. She joined the Indian Statistical Institute, Kolkata in 2008 as a research scholar. She is working on online handwriting recognition of Bangla towards her Ph.D. thesis.
Ujjwal Bhattacharya received B. Sc. (Hons.), M. Sc. and M. Phil. degrees in Pure Mathematics from the Calcutta University in the years 1986, 1989 and 1991 respectively. He did Post-graduate Diploma in Computer Applications in 1990. He joined the Indian Statistical Institute, Kolkata in 1991 as a research scholar. He received National Scholarship (1980, 1982), ISCA Young Scientist Award (1995) and Amiya K. Pujari Award for best paper (2006). The topic of his Ph.D. thesis was offline handwritten Bangla character recognition. He is a member of the IUPRAI affiliated to IAPR. He joined the Computer Vision and Pattern Recognition Unit of Indian Statistical Institute as a faculty member in 1999. He is currently working at the same place in an Associate Professor (equiv.) position. His research interests include online/offline handwriting recognition, analysis of camera captured scene texts, historical document analysis and machine learning. He was one of the Guest Editors of the forthcoming Special Issue of Pattern Recognition Letter Journal on Frontiers of Handwriting Processing.
S.K. Parui received his Master׳s degree in Statistics and his Ph.D. degree from the Indian Statistical Institute in 1975 and 1986, respectively. He is now a Professor in Computer Vision and Pattern Recognition Unit of the Indian Statistical Institute. His current research interests include image processing, handwriting recognition, statistical pattern recognition, neural networks and information retrieval. Prof. Parui has published over 130 papers in refereed journals, edited volumes and conference proceedings. He has supervised seven Ph.D. students and is currently supervising the Ph.D. work of three students.