Font Size: a A A

The Application Of OCR Technology To Bamboo Scripts Image Digitization

Posted on:2008-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2178360242966324Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Bamboo scripts preserves the rich historical and cultural information. As a writing material in ancient China, Bamboo scripts is the valuable cultural wealth. Digital data processing of Bamboo scripts is an important method for Bamboo scripts protecting and retrieving, but the existing character recognition software can not be applied to the character recognition of bamboo scripts ,because bamboo scripts image has such characteristics as it has major sources of noise interference and some characters used by bamboo scripts have never been used any more. This paper designs one kind of bamboo scripts character recognition system according to bamboo scripts image' s characteristics and achieves the following specific functional modules:Bamboo scripts preserves the rich historical and cultural information. As a writing material in ancient China, Bamboo scripts is the valuable cultural wealth. Digital data processing of Bamboo scripts is an important method for Bamboo scripts protecting and retrieving, but the existing character recognition software can not be applied to the character recognition of bamboo scripts because of bamboo scripts image' characteristics such as it has major sources of noise interference and some characters used by bamboo scripts have never been used any more, and this causes great difficulties to digital data processing of Bamboo scripts . This paper proposes a series of word processing algorithm suitable for bamboo scripts image after lucubration in the features of bamboo scripts image and a large number experiments done in all stages of digital image processing of Bamboo scripts.(1) Compared with the general text images, there is obvious gray changes on the back ground of Bamboo scripts image, and this makes it very difficult to distinguish word and background for existing binary processing algorithms. According the feature of bamboo scripts image, this paper designs 8 Gray Margin of Neighborhood. This method considers the gray differences between word and background is greater than between background, this method firstly calculates the gray value which has the largest margin of gray-scale pixel within 8 neighborhood, secondly designs the threshold value of gray, lastly sets pixel' gray value. Experiments show that 8 Gray Margin of Neighborhood can extract word from complex background and reduce edge noise effectively .(2)This paper proposed word segmentation algorithm suitable for Bamboo scripts image. The first character segmentation algorithm cuts out words of each series making use of vertical projection map ,and then acquires the general location of words making use of level projection map .Several compensation measures such as text merger, outward expansion ,noise removing are used aimed at bamboo edge noise, nodes noise and corrosion noise.(3) This paper presents a fast and efficient method used to seek holes feature. This method firstly fills gaps in word outlying regions using hole filling algorithm, and then fills holes within word until there is no blank region any more. Experiments show that this method can find out how many holes in one word effectively and these holes' location.(4) Improving existing morphological thinning algorithm .This paper adds two reservation template aimed at such phenomena as text connectivity is destruct and key information loses because of pixel is deleted by mistake ;this paper makes use of array note pixel's location and inquiries array to avoid reserved pixel is deleted in allusion to the phenomenon stokes two pixel wide fracture. Experiments show that improved thinning algorithm maintain the original text connectivity commendably.(5)This paper carries out median filtering algorithm, uses bamboo scripts image as sample, tunes filter parameter to filter salt and pepper noise; this paper unifies character to the same size using bilinear interpolation method ,and experiments show that most textual information is remained .(6)This paper studies several common text feature preliminary, and selects several stable features such as holes feature, feature points ,level projection feature and vertical projection feature. These features have been obtained.This paper focuses on combining theory and practice, aims at application, coordinates relations between the process of the project, resource and quality, selects following platforms as the base of study: 1) making full use of several resources such as Internet, VIP,CDMD and Chengdu University of Technology Library;2) using X86 desktop PC as general hardware platform;3)using Windows XP as software development and application platform;4) using Visual Studio .NET as software development tools, MFC as software framework, Paintlib as image manipulation library, and C++ as programming language. Through the setting of above platforms, research results can be translated into application relatively quickly, Auxiliary work reduces effectively ,and the author can concentrate on key issues.
Keywords/Search Tags:bamboo scripts, character recognition, binarization, character segmentation, thinning
PDF Full Text Request
Related items