Font Size: a A A

Research On Feature Selection Method Based On Neighborhood Rough Set For Unbalanced Distribution Mixed Dat

Posted on:2024-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z T YuanFull Text:PDF
GTID:2568307106486284Subject:Applied statistics
Abstract/Summary:
Feature selection is one of the most important topics in machine learning and even artificial intelligence.Feature selection based on neighborhood rough sets has been proved to be an effective method.However,the sensitivity of existing algorithms to unbalanced data is an important drawback in practical applications.This thesis discusses how to use neighborhood rough sets to solve the problem of feature extraction when heterogeneous data are unevenly distributed.Feature selection algorithms can be applied to a wide range of fields,such as fraud detection,recommendation systems,etc..However,because the distribution of data in the real world is not always consistent,the distribution of data in these fields is particularly unbalanced,therefore,the feature selection algorithm proposed in this thesis can achieve better results in these areas,the research algorithm based on unbalanced data presented in this thesis can enable researchers and even industry professionals to obtain more effective results when dealing with practical problems.In particular,neighborhood rough sets can be widely used in numerical data processing.However,most of the existing neighborhood rough sets can not distinguish the mixed samples well when dealing with classification problems.That is,it can not effectively classify categories when dealing with data that is unevenly distributed.The innovation work of this thesis is as follows.(1)Aiming at the problem of data imbalance,the K-nearest neighbor model based on neighborhood entropy is studied,and various neighborhood entropy and neighborhood condition mutual information on the K-nearest neighbor model are defined.Then,based on the neighborhood mutual information,the attribute reduction algorithm of K-nearest neighbor rough set is studied.The experimental results show that the proposed static algorithm is more effective than the existing algorithms,theδneighborhood rough set model and the k-nearest neighborhood rough set model,among the 60 precision results of 20 data sets and three classifiers,43 results are better than other algorithms.(2)Exploring attribute reduction algorithm of K+nearest neighbor rough set under dynamic mechanism.In real life,we often encounter the situation of data updating constantly,at this time the study of attribute reduction under dynamic conditions becomes a necessity.In this thesis,a new neighborhood condition mutual information is defined by using the information entropy in the static state,and a new attribute reduction algorithm is designed,which can obtain a feasible attribute set in a short time in the dynamic background.The experiment compared the running time of static and dynamic algorithms,and found that the speedup ratio will increase with the increase of object set,when the increase of object set is 40%or 50%,the dynamic algorithm is at least 2 times faster than the static algorithm,so the dynamic algorithm K+NCMI is better than the static algorithm in run time.Therefore,when multiple objects are added toK~+NDS,a feasible reduction can be obtained more effectively by the algorithm K+NCMI.
Keywords/Search Tags:unbalanced distribution, granular computing, neighborhood rough set, neighborhood mutual information, feature selection
Related items