Font Size: a A A

Clustering Based Word Extraction From Uyghur Handwritten Documents

Posted on:2018-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y S D T A B L Z AFull Text:PDF
GTID:2348330533456491Subject:Engineering, information and communication engineering
Abstract/Summary:PDF Full Text Request
Word extraction from handwritten text lines is an important part in the study of handwriting text images.It is an important step in the key word search,word recognition and character segmentation areas.The results of segmentation have a direct impact on the subsequent recognition results.The handwritten Uyghur texts have the characteristics of uniqueness,randomness and regularity.These characteristics cause the problem that there is no regularity between the Uyghur words distance and overlap,adhesion circumstances are frequently appeared.Therefore,these written characteristics as the basis,using the clustering algorithm,blank spaces between the connected components are divided into two categories,then text regions are combined according to the classify results and finally get the segmentation points.In this paper,firstly,preprocessing text line images.The preprocessing stage solves the problem of adhesion and overlap between the words and remove the noise points.Using the vertical projection method to the pre-processed text line images and get the initial segmentation points.At the same time record the length of blank spaces and connected components.The clustering algorithm used to the blank spaces and classified it into two categories of inter-character gap and inter-word gap.Reuse the clustering algorithm and connected components divided into three categories,merged the the connected components and get the final segmentation points.Finally,the text regions of segmentation point are colored.The paper uses the three clustering algorithms of k-means,FCM and k-means fusion FCM.The performance of these three algorithms is detail analyzed by comparison experiments.Experiments show that in the three kinds of clustering algorithms the K-means clustering algorithm takes the shortest time,FCM algorithm and the fusion algorithm have the same segmentation accuracy.However,the time consumed by fusion clustering algorithm is shorter than that of FCM algorithm.The average accuracy rate of 75.66% was obtained using the fusion algorithm.
Keywords/Search Tags:Uyghur, Handwritten text images, Word extraction, Clustering, Coloring
PDF Full Text Request
Related items