Font Size: a A A

The Study Of Uyghur Language Detection In Images With Complex Background

Posted on:2017-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2348330488953380Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and mobile devices, more and more situ-ation need machine to understand the text information in images. In images, the most intuitionistic and easiest acquisitive information is the text information. How to detect the text in images accurately is the most important step to understand the information in images. Text detection and recognition in images is the important research direction in pattern recognition and image processing, but the results can't reach our expecta-tion because of the complexity of the background and text. Many researchers paid their attention on the area and made important contributions. It is of great significance to do the Uygur language detection in images because of the large number of users. We built a robust and efficient system to detect the Uygur language in complex imag-es.How to extract the text connect areas in images and videos efficiently is the key problem. To deal with the problem, we analyzed the Maximally Stable Extremal Re-gions (MSERs) algorithm, which is adopted by many researchers. MSERs algorithm has the advantage of invariance to affine transformation of image intensities. But for the images that the contrast between the text and background, the MSERs algorithm is also insufficient. To avoid the disadvantage of MSERs algorithm, we built the text detection system based on the Channel-enhanced MSERs algorithm. First, we used the Channel-enhanced MSERs algorithm to detect the text candidate connect areas. Most of the text connect areas can be found by the algorithm but with some non-text noises. The Classification of text and non-text is the important and difficult problem in the text detection. The accuracy of classification of the text and non-text could di-rectly influence the result of the whole system. To solve the problem, for the vast text candidate areas, we pruned some simple noises by some heuristic rules and remove the rest non-text areas using the SVM (Support Vector Machine) with polynomial kernel by extracting the HOG (Histogram of Oriented Gradient) features of these are-as. For the rest MSERs, the regions with similar features are connected into text line candidates, and the short chains are expanded by an extension algorithm to connect these missed MSERs. We used some line-level heuristic rules to prune some text line noises and then used the Random Forest classifier to identify the text line candidates by extracting a collection of texture features.To test the performance of our system, we introduced a new dataset named IMAGE570. As a result, experimental comparisons on the proposed dataset prove that our algorithm is effective for detecting Uygur Language text in complex background images. The F-measure is 85%, much better than the state-of-the-art performance of 75.5%.
Keywords/Search Tags:Uygur language analysisi, images with complex background, text de- tection, channel-enhanced maximally stable extremal regions, classiifer
PDF Full Text Request
Related items