Font Size: a A A

Research On Feature Selection Methods Based On Neighborhood Rough Sets And Lebesgue Measure

Posted on:2021-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:L Y WangFull Text:PDF
GTID:2480306197995739Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recently,by the rapid development of information technology,a large amount of information presents explosive growth,which leads to the emergence of redundant information and has an impact on decision making;then,the preprocessing of redundant information becomes more urgent.Feature selection,also known as attribute reduction,has made rapid progress as the core of data preprocessing,and the main objective is to choose the most effective features in original information systems,and improve the data processing efficiency whilst maintaining the classification ability of original data.At present,feature selection has been widely used in artificial intelligence,data mining,pattern recognition,and other fields.The traditional feature selection models only study uncertainty measures from algebraic view or information view.In this paper,the Lebesgue measure is introduced to propose three feature selection methods for tackling the mixed information systems with symbolic and numerical data,and the effectiveness of the presented methods and algorithms is verified through experimental results and analysis.The major research of this paper consists of three aspects:(1)Aiming at the information systems with mixed symbolic and numerical data,in order to analyze the uncertainty measures of neighborhood rough sets from algebraic and information views,a feature selection method based on Lebesgue and entropy measures in neighborhood rough sets is proposed,which has the ability to deal with mixed datasets directly and improve classification performance while preserving the original information.First,Lebesgue measure is introduced into neighborhood rough set model to make up the drawbacks of most traditional rough set model can not analyze countable infinite sets.Second,based on the theory of algebraic and information,the uncertainty measures of the neighborhood roughness and the joint entropy are given,respectively,and the neighborhood roughness joint entropy is defined.Then,based on Lebesgue and entropy measures,a feature selection algorithm in neighborhood rough sets is designed to handle mixed data.Finally,the simulation experiments are carried out on five UCI datasets and four gene datasets,and the results show that the proposed method is effective for selecting the most relevant feature subset and achieving better classification performance.(2)Aiming at the mixed and incomplete datasets with symbolic and numerical data,a feature selection method for incomplete neighborhood decision systems based on Lebesgue and entropy measures is put forward,which has the capacity to handle mixed and incomplete datasets with missing values and theoretically analyze infinite sets.First,the neighborhood tolerance relation based on Lebesgue measure is constructed to study the positive region and dependence degree in incomplete neighborhood decision systems from algebraic view.Second,the definition of neighborhood tolerance entropy based on Lebesgue measure from information view is given,and the neighborhood tolerance dependency joint entropy is defined on the basis of the two views.Then,based on the neighborhood tolerance dependency joint entropy,a feature selection algorithm in neighborhood rough sets is designed to handle mixed and incomplete datasets.Finally,numerical experiments are conducted on seven UCI and eight gene expression datasets,and the results illustrate that the presented method is effective for sifting the most relevant features with great ability for incomplete neighborhood decision systems.(3)In the process of feature selection,aiming at the problem that most rough set models based on single binary relation have more computational complexity,multi-granularity rough set model is employed in mixed and incomplete information systems with symbolic and numerical data,and a neighborhood multi-granularity rough sets-based feature selection method using Lebesgue and entropy measures is constructed.First,the optimistic and pessimistic neighborhood multi-granularity rough set models in incomplete neighborhood decision systems are given and actualize the combination of Lebesgue measure.Second,the measures of optimistic and pessimistic neighborhood multi-granularity rough sets in algebraic view and the neighborhood multi-granularity entropy in information view are given,and the pessimistic neighborhood multi-granulation dependency joint entropy is defined.Then,based on the pessimistic neighborhood multi-granulation dependency joint entropy,a feature selection algorithm in neighborhood multi-granularity rough sets is designed to handle mixed and incomplete neighborhood decision systems.Finally,the simulation experiments are conducted on seven UCI and eight gene expression datasets and demonstrate the proposed method is effective.
Keywords/Search Tags:Neighborhood rough sets, Feature selection, Uncertainty measure, Lebesgue measure, Neighborhood entropy
PDF Full Text Request
Related items