Font Size: a A A

An Attribute Reduction Algorithm Based On Dynamic Neighborhood Rough Set For Text Classification

Posted on:2018-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhaoFull Text:PDF
GTID:2348330518998521Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of machine learning theory and method,the ability of computer to deal with massive data greatly improved.However, there are a lot of redundant and incomplete information in the massive data, which seriously affects the performance and data processing ability of the machine learning algorithm. In order to solve this problem, some scholars put forward the concept of data reduction.Data reduction is a way to remove redundant information from data in the case of maintaining the original data classification capability unchanged. How to effectively carry out data reduction and how to preserve the effective information is an important research direction of machine learning and data mining.In recent years, rough set theory has become an effective tool for dealing with inaccurate, inconsistent and incomplete data. It has been widely applied in machine learning and many other fields. As a expansion of rough set, the neighborhood rough set model can deal with the continuous data well, which solves the problem of information loss and the dependence on the discretization method in the classical rough set. This paper studied the attribute reduction algorithm based on neighborhood rough set model, which mainly includes:(1) In order to better determine the neighborhood value for a particular data set and improve the reduction effect, we combined FCM algorithm and neighborhood rough set. Based on the attribute importance degree as the heuristic condition, a forward greedy attribute reduction algorithm based on the Canopy-FCM asymmetric variable neighborhood rough set model is constructed. The algorithm determines the specific neighborhood value for each data in each attribute, so that the setting of the neighborhood value is completely based on the distribution of the data, thus avoiding the drawbacks of setting the global neighborhood value. The algorithm can select the attribute with high contribution to the decision-making ability accurately. The experimental results of the open data set on UCI show that the algorithm in this paper can retain less conditional attributes, and it can improve the classification accuracy of data.(2) The attribute reduction algorithm proposed in this paper is applied to the Chinese text classification to extract key feature words and reduce the influence of redundant vocabulary on classification effect. In this paper, Li Ronglu's Chinese corpus is used as the experimental object to test. The experimental results show that the proposed attribute reduction algorithm can reduce the text feature words and reduce the dimension of text sets. This algorithm can improve the classification ability of text data, and can capture the key information more accurately. So it has certain practical significance.
Keywords/Search Tags:Rough Set, Neighborhood Rough Set Model, Attribute Reduction, Text Classification
PDF Full Text Request
Related items