With the rapid development of big data application technology,higher requirements are put forward for the understanding and processing of high-dimensional data sets,especially for data sets that contain a large amount of noise,irrelevant,and redundant.Due to the existence of the curse of high-dimensional data,it also poses enormous challenges to research fields such as data mining,knowledge discovery,and pattern recognition.Rough Set Theory,as an important granular computing tool and method,can efficiently help us mine valuable knowledge and information from complex data.Currently,it has been widely used in the field of feature selection(also known as attribute reduction).Among the currently proposed feature selection algorithms,they can be divided into three categories:(1)feature selection algorithms based on filtering ideas,(2)feature selection algorithms based on packaging ideas,and(3)feature selection algorithms based on embedding ideas.These algorithms can better select the most representative and strong generalization ability attribute subset,and achieve the desired experimental results of feature selection.However,with the further research and discussion of the neighborhood rough set theory,it is found that there are still some deficiencies to be further improved.For example,how to use a better measurement method to evaluate the correlation and redundancy of features.Another point is that most algorithms ignore the interaction between features,resulting in the loss of some important hidden information in the process of feature selection.This may make the resulting generic subset weak in generalization ability and low in classification accuracy.In view of the existing problems of the feature selection methods proposed at present,this paper first proposes to measure the correlation between condition attributes and decision attributes,the redundancy between condition attributes and condition attributes by using the neighborhood symmetry uncertainty,and the interactivity between condition attributes and condition attributes by using the conditional mutual information.Then,based on the principles of maximum relevance,minimum redundancy and maximum interactivity in feature selection,the target evaluation function of feature selection is proposed.On this basis,this paper innovatively proposes a feature selection algorithm based on neighborhood rough sets.Finally,in the experimental part,this paper compares with five classical feature selection algorithms on nine sets of real data sets,and evaluates them from three different experimental evaluation indicators.The experimental results show that the algorithm can obtain the attribute subset with strong generalization ability and high classification accuracy,and can achieve the expected results and purposes of feature selection. |