Based On The Stem Of The Uyghur Language Text Cluster Research And Implementation

Posted on:2013-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2218330374967001

Subject:Computer application technology

Abstract/Summary:

As the key technology of processing and organizing large number of text data, text clustering to a great extent can solve large-scale, disordered text's information retrieval. The goal of text clustering is according to the similarity among the text, the text set is divided into several kind of cluster, make the similarity of the text contents in the same cluster is higher, and the similarity of the text contents in the different cluster is lower.The research of text clustering in Uyghur starts late, it is not very mature at present. This paper based on the previous clustering of Uyghur language, combining the rules of word-building in Uighur language, implements word stemming of the Uyghur word in the text, proposed a new algorithm of Uyghur clustering feature extraction which based on word stem extraction, and using k-means algorithm in clustering experiment. The results show that, the method can effectively reduce the dimension of feature space, the accuracy rate, recall rate and F-Measure value and other parameters in the clustering results have increased to a certain extent at the same time.On the basis of experiments, this paper uses Java language to achieve a Uyghur text clustering system--Uyghur Text Cluster, system contains the pretreatment of text set, based on Uyghur clustering feature selection of word stem, the representation of Uyghur text in the vector space, using k-means algorithm to complete the function of the text clustering, and clustering results representation respectively.

Keywords/Search Tags:

Text Clustering, Uyghur, Uyghur stem, Feature Selection

Related items

1	Uyghur Text Clustering System Design And Implementation Based On Python
2	Research On Uyghur Text Classification And System Development
3	Study On The Text Classification Feature Selection Method-the Uyghur Language
4	Design And Implementation. The Uighur Text, And Text Converter
5	Research On Uyghur Text Recognition In The Scene Image
6	Clustering Algorithms Research For Uyghur Text
7	Research On The Filtering Method Of Uyghur Adverse Text Information
8	A Technique For Locating The Overlaid Uyghur Text Lines In Video Images
9	Recognition Of Printed Uyghur Words Based On Segmentation
10	Automatic Extraction Of Uyghur Ontology Concept Classification Relationship Based On Seed Bootstrap