Font Size: a A A

Chinese Character Recognition Of Printed Document Images

Posted on:2012-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HeFull Text:PDF
GTID:2178330338990750Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
As the fast increase of the information communication and exchanging requirement, it is meaningful to transform the document information stored by paper into the digital form automatically. The document Chinese character recognition is one of the hotspots in the field of pattern recognition and digital image processing. Based on the present situation of analysising deeply in home and abroad, research and design a recognition system about Chinese character in the printed document images.First of all, the paper analyzed the layout of the document images which have finished preprocessing. In order to effectively avoid adjustment about experience of the threshold in the traditional algorithm, the paper uses a layout analysis algorithm based on fuzzy nearest neighbor connect-strength and line confidence. This method based on search algorithm of connected region, and twice merges low-columns of text, and accurately extracted text areas.Secondly, the key research character segmentation for text areas. For the problem about the high rate of traditional segmentation algorithms, the algorithm of vertical projection is improved in the paper. In order to further enhance the accuracy of character segmentation, the method takes Simple Bayes Classifier distinguish character language discrimination, then based on conventional algorithms, the paper presents a Chinese components combined algorithm, which based on recognition feedback,and an improved drop fall algorithm, which sloves the problems about inaccurate choice the starting position and the damage of character.Finally, the paper adopts a secondary recognition for Chinese character, it is used to be the first recognition with traversing times of strokes which is full-breakthrough, and the paper introduces half-breakthrough of strokes based on it, then it is used to implement the first recognition that the combination of full-breakthrough and half-breakthrough. This method was used to solve the problem the low efficiency for some Chinese characters, reduce workload for second recognition. The energy-density is used to do the second recognition for the Chinese characters which can not be recognized in the first recognition.The system applys the MATLAB environment for software platform, the feasibility of the algorithm is verified by simulation, eventually get Chinese recognition accuracy reaches 90.8%.
Keywords/Search Tags:Document image, Image preprocessing, Layout analysis, Character segmentation, Chinese character recognition
PDF Full Text Request
Related items