An Attribute Reduction Algorithm Based On Dynamic Neighborhood Rough Set For Text Classification

Posted on:2018-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:X H Zhao

Full Text:PDF

GTID:2348330518998521

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of machine learning theory and method,the ability of computer to deal with massive data greatly improved.However, there are a lot of redundant and incomplete information in the massive data, which seriously affects the performance and data processing ability of the machine learning algorithm. In order to solve this problem, some scholars put forward the concept of data reduction.Data reduction is a way to remove redundant information from data in the case of maintaining the original data classification capability unchanged. How to effectively carry out data reduction and how to preserve the effective information is an important research direction of machine learning and data mining.In recent years, rough set theory has become an effective tool for dealing with inaccurate, inconsistent and incomplete data. It has been widely applied in machine learning and many other fields. As a expansion of rough set, the neighborhood rough set model can deal with the continuous data well, which solves the problem of information loss and the dependence on the discretization method in the classical rough set. This paper studied the attribute reduction algorithm based on neighborhood rough set model, which mainly includes:(1) In order to better determine the neighborhood value for a particular data set and improve the reduction effect, we combined FCM algorithm and neighborhood rough set. Based on the attribute importance degree as the heuristic condition, a forward greedy attribute reduction algorithm based on the Canopy-FCM asymmetric variable neighborhood rough set model is constructed. The algorithm determines the specific neighborhood value for each data in each attribute, so that the setting of the neighborhood value is completely based on the distribution of the data, thus avoiding the drawbacks of setting the global neighborhood value. The algorithm can select the attribute with high contribution to the decision-making ability accurately. The experimental results of the open data set on UCI show that the algorithm in this paper can retain less conditional attributes, and it can improve the classification accuracy of data.(2) The attribute reduction algorithm proposed in this paper is applied to the Chinese text classification to extract key feature words and reduce the influence of redundant vocabulary on classification effect. In this paper, Li Ronglu's Chinese corpus is used as the experimental object to test. The experimental results show that the proposed attribute reduction algorithm can reduce the text feature words and reduce the dimension of text sets. This algorithm can improve the classification ability of text data, and can capture the key information more accurately. So it has certain practical significance.

Keywords/Search Tags:

Rough Set, Neighborhood Rough Set Model, Attribute Reduction, Text Classification

PDF Full Text Request

Related items

1	Research And Application Of Attribute Reduction Algorithm Based On Neighborhood Rough Set
2	Research On Attribute Reduction Algorithms Based On Extended Rough Set Model
3	Research Of Attribute Reduction Algorithm Based On Neighborhood Rough Set
4	Attribute Reduction Algorithm Of Neighborhood Rough Sets And Its Application In Classifier
5	Research And Application Of Attribute Reduction Algorithm Based On Neighborhood Rough Set
6	Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization
7	Attribute Reduction Algorithm For Neighborhood Rough Sets And Its Application In Classifiers
8	Research On Heuristic Attribute Reduction Algorithm For Neighbourhood Rough Set
9	Research On Accelerated Algorithm Of Attribute Reduction In Rough Sets And Its Neighborhood Model
10	Research On Three Branch Acceleration Method For Neighborhood Rough Set Attribute Reduction