Font Size: a A A

Based On The Stem Of The Uyghur Language Text Cluster Research And Implementation

Posted on:2013-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2218330374967001Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the key technology of processing and organizing large number of text data, text clustering to a great extent can solve large-scale, disordered text's information retrieval. The goal of text clustering is according to the similarity among the text, the text set is divided into several kind of cluster, make the similarity of the text contents in the same cluster is higher, and the similarity of the text contents in the different cluster is lower.The research of text clustering in Uyghur starts late, it is not very mature at present. This paper based on the previous clustering of Uyghur language, combining the rules of word-building in Uighur language, implements word stemming of the Uyghur word in the text, proposed a new algorithm of Uyghur clustering feature extraction which based on word stem extraction, and using k-means algorithm in clustering experiment. The results show that, the method can effectively reduce the dimension of feature space, the accuracy rate, recall rate and F-Measure value and other parameters in the clustering results have increased to a certain extent at the same time.On the basis of experiments, this paper uses Java language to achieve a Uyghur text clustering system--Uyghur Text Cluster, system contains the pretreatment of text set, based on Uyghur clustering feature selection of word stem, the representation of Uyghur text in the vector space, using k-means algorithm to complete the function of the text clustering, and clustering results representation respectively.
Keywords/Search Tags:Text Clustering, Uyghur, Uyghur stem, Feature Selection
PDF Full Text Request
Related items