Font Size: a A A

Research On Ancient Literature Text Image Segmentation And Otherness Comparison Method

Posted on:2017-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:X J WuFull Text:PDF
GTID:2348330488987611Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Ancient literatures have important historical and academic research value. With the continuous deepening research of the ancient literature, versions otherness comparison study has become an important research field of philology. Compared with the artificial methods, image technology is a more effective way which can improve the efficiency and accuracy of the segmentation. However, because of the poor quality of the existing ancient literatures, the obtained images are not very clear and may lose lots of important information. In addition, there are many hard dealt conditions in Chinese characters, such as great randomness of handwriting, overlapping, conglutination, varied font types(likely xiaozuan, lishu, regular script, etc), all of which can create a lot of difficulties in the study of ancient literature text image segmentation and versions otherness comparison.According to the characteristics of the ancient Chinese literatures and characters, this thesis proposes a text line segmentation algorithm, two kinds of character segmentation algorithms and an ancient literature otherness comparison method. Text line segmentation uses the method of circular projection filtering based on statistics. Character segmentation employs multi-step segmentation algorithm based on piecewise projection and multi-step segmentation algorithm based on variable window. Versions otherness comparison adopts sliding window matching algorithm based on feature extraction.The method of circular projection filtering is based on statistics. Firstly, it counts up the vertical projection of ancient documents, and figures out the number of black pixels in each column. Then it adopts the method of loop filter to deal with statistical results until much uniform columns are isolated. Even in some complex cases, like much noise, certain inclined and column height is not uniform, etc, our algorithm still has good performance.On basis of text line segmentation, multi-step segmentation algorithm based on piecewise projection first use projection segmentation method to split Chinese characters which are away from each other. Then, for Chinese characters are not away from each other, successively adopt the method of piecewise projection segmentation and segmentation of strokes features at top and bottom. At last, adopting the context combined method to test the segmentation. Using the idea of dichotomy, piecewise projection method divides the characters which are overlapped or adhesion into two parts, then projected respectively. After that, analyzing projection arrays, we get segmentation path; after finding the oversegmentation and under-segmentation by SM-SFTB(the segmentation method of strokes features at top and bottom) using the characteristics of Chinese character strokes, the adjustment for segmentation is possible. The experimental results show that the proposed methods have good performance for historical Chinese documents in the complex condition of existing lot's of overlap and adhesion.Similarly, on basis of text line segmentation, multi-step segmentation algorithm based on variable window first uses projection segmentation method to split Chinese characters which are away from each other. Then, for the Chinese characters which are not away from each other, the method of variable window is used in this situation to seek out segmentation path of every character in the character string. The experiment results show that this approach makes full use of the trend information of the stroke pixel and has better accuracy rate on the situation of overlapping and conglutination.Sliding window matching algorithm based on feature extraction is a method for ancient literatures otherness comparison after those characters are segmented. Firstly, we normalize the handwritten Chinese characters; next, we get the feature vectors through feature extraction by using wavelet transform, distance calculation and peak analysis methods; then, we calculate the differences and similarities between Chinese characters according to the feature vectors; Finally, we compare two images by adopting a method based on the sliding window and mark the locations in the original text image where the contents of two images are different. In the experiment, the algorithm has a high accuracy rate.
Keywords/Search Tags:Ancient Document, Character Segmentation, Otherness Comparison
PDF Full Text Request
Related items