Font Size: a A A

Research On Recovery Of Dynamic Information From Handwritten Chinese Character Images

Posted on:2010-06-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W SuFull Text:PDF
GTID:1118360302471177Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recovery of dynamic information (RDI) is a procedure to extract writing temporal information from static handwritten character images, which can also be treated as to convert a two-dimensional static image into one-dimensional temporal signal sequences. With the development of information management and application of handwritten Chinese characters, it is more and more essential to retrieve dynamic information from the static handwritten Chinese character images, which can not only facilitate the storage and retrieval of huge amounts of the images, but also bridge the gap between online and offline and enable the application of online methods to offline data with the performance of offline methods improved.We conducted deep researches on the problems of RDI from handwritten Chinese characters by structural analysis and template matching, which can be divided into five main aspects: stroke extraction based on ambiguous zone detection, structural modeling of Chinese characters, RDI based on structural model, skeletonization of handwriting images and RDI by template-matching.To effectively identify and interpret ambiguous zones (AZs) in handwritten Chinese image, a new method of stroke extraction based on AZ detection is presented as a preprocessing step of the RDI. First, approximate center points (ACPs) of AZs are identified by feature points of the skeleton, and AZs are detected using the ACPs and the contour information around them. Then, a graph is built to model sub-strokes and AZs and a Bayesian classifier is built to analyze the continuity of sub-stroke pairs. Accordingly, several constraint conditions are proposed to search stroke paths in the graph and two criteria are also utilized to deal with multi-traced sub-strokes. Finally, sequences of sub-stroke are achieved by searching paths in the graph, and thinned strokes can be retrieved by B-spline interpolation. The proposed method is effective and accurate for both AZ detection and stroke extraction, and avoids the stroke distortions in AZs.After stroke extraction, a model for RDI from offline handwritten Chinese character is proposed based on the analysis of hierarchical structures of Chinese characters. In this model, a 4-layer hierarchy is presented to model each Chinese character, where character, component, subcomponent and stroke are located at each layer, respectively. Characters are decomposed into components and each component is decomposed to subcomponents in turn by four decomposing operators. Then, five decomposed relations between subcomponents are formed. The totally-ordered relations between subcomponents are retrieved by defining the corresponding rules between decomposed relations and a poset of subcomponents, and subcomponents are the basic recovering primitives in the model, their writing orders are recovered by classifying strokes and pairs of crossing strokes. The method of structural analysis is more effective to the Chinese characters with complete hierarchical structures.An unavoidable problem of most existing skeletonization algorithms for handwriting images is the production of undesired artifacts or pattern distortions. To resolve this problem, a method of identifying these unreliable segments is proposed to improve the skeletons of handwriting images. First, a novel feature called iteration time is proposed, by which each unreliable segment can be treated as a set of points with exceptional iteration times. At the same time, an undirected graph is built from the skeleton whose nodes correspond to feature points and the sets of surrounding points with exceptional iteration times. To avoid the influence of segment length, a novel distance measurement based on iteration time is defined as well to weight edges in the graph. Then, the problem of identifying unreliable segments is converted to finding a certain number of sub-graphs, which is achieved by a graph clustering algorithm with an effective clustering quality estimation function. In the process of correcting unreliable segment, a best-matched method is utilized to determine the continuous pairs of reliable segments. Finally, the unreliable parts of the skeleton are reconstructed by a cubic B-spline interpolation. The correction method is effective to both Chinese and English handwriting images.In the RDI of template-matching, input character images are matched with template sequences from the perspective of subsequence matching. First, the matching cost between the points of the input character and the template sequence is defined by shape context. After segmentation of the skeletons, each separated skeleton segment of the input character is matched with the template sequence. To reduce the computational cost, an efficient algorithm of subsequence matching under DTW distance is proposed. Finally, the best-matched global paths are found by sub-graph mapping and sub-graph intra-recovery. The RDI of template-matching is robust to both geometric transformation and cursive handwriting with a high recovering performance.
Keywords/Search Tags:Handwritten Chinese Character, Dynamic Information Recovery, Skeletonization, Ambiguous-zone Detection, Hierarchical Model, Template Matching
PDF Full Text Request
Related items