Font Size: a A A

The Study Of Methods To Extract Text In Ancient Yi Documents Under Complex Background

Posted on:2012-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:R XiaoFull Text:PDF
GTID:2218330341951456Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Ancient Yi character documents load the ancient civilization of Yi, but they survive in a very difficult situation and need to be protected by digital methods urgently. Extracting text from Ancient Yi character documents accurately is a important precondition to recognize them. Because of the characteristics of Ancient Yi character documents, severe degradation and complex background, the study of methods to extract text from them will not only have benefits to the protections and usage of them, but also explore new ideals of extracting text from complex background.This thesis first introduces basic steps of extracting text from complex background, including text detection; text localization and foreground-background separation, then compare and analyze the methods text detection mainly. To resolve the defects of methods based-on edge or texture feature, we propose a new method to detect text accurately based on edge and texture features together for Ancient Yi character documents. And then propose a complete solution to extract text from them. Our work includes:Firstly, Yi characters are composed of strokes in four direction (horizontal, vertical, up-right slanting, and up-left slanting), and pixels of strokes have strong edges. So, we adopt Sobel edge operator to detect edges in four directions, and extract features for every pixel in four edge maps. At the same time, text in ancient documents can also be viewed as a kind of regular texture. We apply wavelet transformation to decompose original image, and extract features for every pixel in high-frequency sub images to reflect details of texture. We will combine edge and texture features to reflect the characteristics of text in Ancient Yi character documents accurately.Secondly, we adopt the Gradient Boost Descent Tree learning theory to design a classifier to classify text and non-text pixels. It combines boosting and decision tree theory to improve performance of decision tree greatly. In addition, it can avoid "overfitting" problem. Because we adopt decision tree as the base learner, so we don't need to normalize features in different dimension and can get interpretable decision rules. It's suited to classify text and non-text pixels in images.At last, we apply morphological transformation and some priori rules to localization text regions. To separate text and background in text regions, we first apply Wiener filter to smooth background, eliminate some noises and enhance contrast, then binarize text regions by local adaptive threshold method.Experiments show that compared to method based on edge or texture features separately, text detection based on edge and texture features together can improve the accuracy of detection greatly for Ancient Yi character documents. And our solution can extract text from Ancient Yi character documents accurately.
Keywords/Search Tags:Complex Background, Text Extraction, Text Detection, Edge Detection
PDF Full Text Request
Related items