Font Size: a A A

A Uighur Words Recognition Technology Based On Contour

Posted on:2016-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330476450407Subject:Software engineering
Abstract/Summary:PDF Full Text Request
This paper concentrates on the recognition of Uyghur words on scanned images to realize the electronization of new Uyghur words paper documents. The paper have two main parts: the first one is the pretreatment, In Uighur text scanning process, the input document inevitably occur tilt function,however,we have only carried on the preliminary tilt correction using the existing method.In view of the above situation,at the same time in order to facilitate the segmentation and recognition of Uighur character scanning this paper proposed one approach that combining minimum-area bounding rectangle method based on the convex polygon with baseline fitting method to detect and correct text. Firstly used the minimum-area bounding rectangle method based on the convex polygon to realize the initial correction. Then extracted a line of the text and used baseline fitting method to correct the tilted text individually, finally, integrated the corrected text lines into a document. The existing methods only carried out preliminary tilt correction. The experimental results show that this method is accurately and effectively, using our scheme can obtain a general increase in the accuracy of character segmentation by about 5% compared to the existing methods, and get a highest accuracy up to 7%.The other one is words recognition. The recognition of new Uyghur words normally use a large dimension feature to achieve high precision, but this method can be very time consuming. In turn says, using small dimension feature can be time saving but the precision is low. An ideal recognition requires not only high speed but also high precision. To achieve this goal, this paper chose the whole word recognition method to avoid the shortcoming of word recognition after recognize each letter in the word. So, this research selected outlines including outer contour on the baseline, the inner contour in the baseline and the outer contour behind the baseline as main feature. For the words that couldn’t be recognized in a unique identification, the second recognition test using complimentary characteristics was carried out. After testing, this method not merely ensured the high speed of small dimension but also got high precession in high dimension. The experiment showed that the recognition rate of this system is 95%, and its average speed was in an actual required acceptable scope.
Keywords/Search Tags:Uighur, Tilt Correction, Baseline Fitting, A Line Of Text, Feature Extraction, Word Recognition
PDF Full Text Request
Related items