Font Size: a A A

Research And Application Of Handwritten Text Segmentation Algorithm In Ancient Books

Posted on:2019-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:M SuFull Text:PDF
GTID:2428330548467871Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Ancient handwritten text is mostly regular script and cursive.On the same page,the regular script strokes dispersion,the font size is basically the same;cursive font strokes are more concentrated,but the font size is a lot of difference.Therefore,the segmentation methods for two types of adhesions are also different.How to improve the correct rate of Chinese characters in regular script and cursive script is still an important and difficult problem in the current segmentation of ancient Chinese characters.There are still many imperfections in the existing method of segmentation of handwritten text images,such as the inability to accurately find the correct segmentation point,resulting in redundancies or lack of strokes in the Chinese character image;different requirements for the algorithm when segmenting different kinds of adhesive characters.To solve these problems,the paper elaborates domestic and overseas character segmentation,and uses the improved dripping algorithm and the improved Self-organizing Maps(SOM)clustering algorithm to segment and analyze the regular script and cursive characters.The main research work and contributions of this paper are as follows:(1)Aiming at the vertical writing of Chinese characters in ancient books,the strokes are scattered,the upper and lower characters are glued,and the sticking points are easy to overlap.This thesis proposes a drop fall algorithm for dividing ancient literature.First,the method of projecting black pixels is used to divide the columns of Chinese characters in ancient books.Secondly,the improved drop fall algorithm is used to segment each column of Chinese characters.By determining the minimum point in each column string,the length threshold of segmented font can be estimated.Since each font's length statistics are within a certain range,we can set the conditions to filter the length threshold,the initial drop point of each character in the string can be determined and the string can be divide circularly.Finally,the segmentation is performed according to the drop rule.The experimental results show that this method is suitable for the segmentation of touching fonts in the premise of the Siku Quanshu book.It has certain universality and can be divided into ideal single ancient document fonts.(2)For cursive font size is not the same,it is difficult to determine the segmentation threshold,some scholars have proposed a method for segmenting stuck characters based on self-organizing neural network algorithm.But the segmentation algorithm based on SOM clustering has high requirement for the connectivity of Chinese characters and the time complexity of the algorithm is high.In order to solve these problems,this chapter puts forward the algorithm of segmentation which called false neighbor neuron algorithm based on the improved SOM clustering in cursive font.Firstly,the image preprocessing,including image denoising,binarization processing;secondly to determine the adhesion area widths;then the improved SOM clustering analysis is carried out to white pixels in the adhesion area.Finally,searching for the shortest path between cluster points on the left and right sides of the black pixels in the area,so as to determine the segmentation path.The experimental results show that,combined with the algorithm of "shortest path of query" and false neighbor neuron algorithm,a comprehensive implementation scheme is proposed for the segmentation of adhesive cursive font.The results are better and the success rate is improved for the samples with thicker strokes and more serious adhesion.Compared with the existing algorithms,the algorithm is more accurate in the clustering of neurons and the segmentation accuracy is higher.
Keywords/Search Tags:Regular Script and Cursive, Column Segmentation, Drip Algorithm, SOM Clustering, False Neighboring Neuron Algorithm
PDF Full Text Request
Related items