Font Size: a A A

Research On Chinese Texts Clustering

Posted on:2011-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:W J HuangFull Text:PDF
GTID:2178360308452588Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Texts clustering, an important field in texts processing, has great significance on research of network supervision and control, information filter and retrieval, etc. The system of Chinese texts clustering based on new algorithm is implemented in this paper after discussion and research on format of texts vector and texts clustering algorithms.Several innovations of the texts clustering algorithm are presented in this paper based on traditional algorithms and systems of texts clustering.Firstly, texts clustering ideology based on partition, density and arrangement is successfully combined to improve on the detection of cluster's shape.Secondly, the advantages of shape detection and combination of shape detection and clustering efficiency are testified through theory and experiments. Veracity of this new method is 4% higher than that of traditional texts clustering algorithm.Thirdly, a facilitated model of Chinese texts clustering is introduced in this article based on traditional model and it is an application of the new texts clustering algorithm.Fourthly, the thesis provides a theoretic introduction of the new texts clustering algorithm and illustrates the implement of the new algorithm in detail.In the last part, the system of Chinese texts clustering is realized on basis of space vector model of texts and this new texts clustering algorithm. Classified documents are experimented to prove that dimension of space vector has an effect on veracity of classifying compared with K-means algorithm, Chameleon algorithm and K-C algorithm. The results show that the system of Chinese texts clustering implemented in this paper and K-C algorithm has better performance and stability.
Keywords/Search Tags:vector space model of texts, system of Chinese texts clustering, K-means algorithm, Chameleon algorithm, K-C algorithm
PDF Full Text Request
Related items