Font Size: a A A

Research And Implementation Of Text Clustering Based On DK-Means

Posted on:2009-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:L L YuFull Text:PDF
GTID:2178360308479257Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As popularization of information technology in various fields, the data of variety application is generated by an exponential growth level. Dealing with these data effectively and extracting useful knowledge is a problem to solve urgently. Data Mining is new technology for meeting the full understanding and effective application of the information and knowledge contained in the data. Clustering the Data is better way to find the Structure and knowledge in the data. The cluster analysis is dividing the data into several categories or clusters according to the similarity between data. The Cluster analysis is better pretreatment with data collected before statistical analysis.The cluster analysis is a clustering process according to the similarity in the absence of supervision. The Documents will be divided into the cluster using cluster analysis that can be understood by user. So, the users can master the content of a large number of texts rapidly, and accelerate the pace of analysis and help making decision. Cluster analyses have been used in many fields, for example, pattern recognition, image processing, IR, and other disciplines. The type of data sets is different according to different demand. For example its have ordinal number, scalar, text, and other types. This paper mainly researches the clustering of the text.In this paper, the approach of text drop-dimensional and algorithm of the clustering involved in the text clustering were researched. Firstly, in the pretreatment of text, the method of segment combined with frequency of word was proposed. It can improve the accuracy of the segment and prepared for the construction of text Model and text drop-dimensional. Secondly, the method of drop-dimensional based on the similarity of text was proposed. It extracts the word of highly relevant to the text by calculating the word's contribution to the text category. It improves the efficiency and precision of the text clustering. Finally, the paper proposed algorithm of the text clustering based on DK-Means that improve the accuracy of the clustering and clustering speed.The paper firstly introduce the cluster analysis technology belong to the field of data mining. Then, it introduce the technology related to the text clustering including the pretreatment of text, the text model, technology of feature drop-dimensional and the algorithm of the text clustering, and proposed the new method of feature drop-dimensional based on text'similarity and new algorithm of the text clustering. Finally, the paper design and implement the text clustering according to the new method of feature drop-dimensional and the text clustering'algorithm. After experiment, it not only improves the accuracy and purity of the text clustering'result, but also improve the speed of the text clustering.
Keywords/Search Tags:data mining, cluster analysis, the text clustering, feature drop-dimensional, cluster' algorithm
PDF Full Text Request
Related items