Font Size: a A A

Research And Application Of Categorization Algorithms Based On Rough Sets Attributes Reduction

Posted on:2014-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2248330398450535Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data classification is an important topic in the field of data mining. It produces a classification model which maps the not marked data to a given specific category according to the characteristics of the dataset. One of the classification technologies is based on the traditional classify technology. The other one is software-based computing technology which can deal with the uncertainty, integrity and non-uniform data. Rough set theory is a mathematical tool in soft computing methods for dealing with vagueness and uncertainty. The main idea is to export classification rules under the premise of the same classification ability by the attribute reduction.In the rough set theory, one of the most important problems is attribute reduction which has been proved to be an NP-hard problem. Both algorithms based on Skowron Matrix and algorithms based on an attribute importance heuristic are often used. In this paper we first introduce the basic algorithms and analyze their advantages and disadvantages. And then find the relations between the minimum attribute reduction of rough set and minimum set cover problem. We propose an attribute reduction algorithm based on improved correlation matrix which can effectively avoid the presence of empty and repeating elements. Based on the improved conventional matrix we propose a minimum attribute reduction algorithm which is able to find the minimum attribute reduction and saves storage space. The theoretical analysis and experiments show that the algorithm can reduce the search space and improve the efficiency of reduction.But rough set is noise-sensitive, need to combine with other soft computing theories. When the text feature has high dimension will make the neural network not easy to convergence and text classification accuracy is low. So in this paper we propose a new classification model named RS-BPNN that transform vector space which has been feature selected into the decision-making table and discrete them. Finally, use the new model to classify the Chinese text categorization. The experimental results show that:the new model has a higher classification accuracy, recall and F1values. Finally, using RS-BPNN classification model to analyze the impact of the various habits of body weight, the results show the prediction accuracy is77.6%.
Keywords/Search Tags:Rough Set, Attributes Reduction, Relation matrix, Neural network, TextCategorization
PDF Full Text Request
Related items