Font Size: a A A

The Study Of The Rough Set Theory In K-means

Posted on:2012-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhangFull Text:PDF
GTID:2178330332975525Subject:Information networks and security
Abstract/Summary:PDF Full Text Request
Data mining is a technology of finding the hidden knowledge and patterns. It is not only a procedure of knowledge acquisition, but also a process of data treatment. From an engineering point of the view, data mining is a repeatedly process. It can be widely used in the business management, the production control, the marketing analysis, and many other fields to obtain the information.The rough set theory is widely used in the data mining. This paper has been deeply studied the rough set theory, the integration of the fuzziness, and the rough entropy. During the study, it is known that the fuzziness, the entropy of the knowledge, and the rough set theory is decreased monotonously with the division of the knowledge. Therefore, this paper has proposed a new hybrid algorithm for attribute reduction, which is combined the k-means algorithm and the rough set theory together, called KRS. KRS algorithm is based on the frequency attribute reduction algorithm and it is a new distinction reduction of the matrix algorithm.In this paper, it is improved the traditional clustering algorithm, conquered the traditional way of the text distance to determine the similarity of the text, ignored the inaccurate of the clustering process. There are several innovations of this paper reflected in the following areas:Firstly, this paper is used the commonly feature selection method to reduce the dimension of the text. And it is proposed a new rough set reduction algorithm for the prior selected of the text attributes. Then it produces some reductions and uses the rough set reduction to remove the redundant attribution. Secondly, it is used the K-means clustering algorithm for the text clustering, During each step, it uses the rough set theory to cluster once again. Through these experiments, it is found out that the result of the clustering is closer to the actual classification. Based on this processing, it is effectively combined the supervised feature selected and the clustering together to deal with the efficient results. The whole procedure is divided into two sections:the first one is used the multi-reduction algorithm as the front processing tool for feature selection, after that it is used the K-means for the attributions reductions. It is significantly reduced the property dimension and the calculation, So the speed of the classification is greatly increased. Thirdly, it is proposed a new method for the rough set filter with the special text filter for the topic. Though the experiments it is more efficiently and more accuracy compared with no use of the rough set attributes, and the fast reduction attributions of the rough set method, when the reduction of the m value is increasing.Based on the above study, this paper is used the WEKA which is the data mining platform to analysis the information. And it is achieved the second development. By the analysis of the vast amount oh data information, it is proved that the algorithm which this paper has proposed is superiority. Meanwhile, it is handle the improvements of the k-means module with My Eclipses. With the validation, the program is feasibility.
Keywords/Search Tags:Rough set theory, Rough Entropy, Text filtering, Vector space model, K-means
PDF Full Text Request
Related items