Font Size: a A A

Research On Ensemble Classification Method Under Attribute Reduction

Posted on:2022-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y MaoFull Text:PDF
GTID:2518306557478534Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and progress of science and technology,we are entering a data era,the scale of data is showing an explosive growth,and these large-scale data often contain a lot of redundant information and some noise data.In the face of these redundant,complex and noise data,how to process and analyze them efficiently becomes the key.In order to deal with these complex and diverse data,we can start from three aspects:(1)For high-dimensional data,reduce the attribute dimension of data,generally,there are attribute reduction methods.(2)For noise data,find out these noises directly from the source to reduce the impact of noise on subsequent processing and analysis,generally,there are noise data filtering methods based on model prediction.(3)For analyzing data from multiple angles,we can combine multiple learners to analyze data from different perspectives,generally,there are ensemble learning methods.Based on the comprehensive introduction of attribute reduction and ensemble learning,this thesis starts from the construction of a new rough set model,and explores the attribute reduction and classification methods for complex and diverse data,in which the influence of noise data is also considered.Finally,the attribute reduction is applied to the process of ensemble classification.The details are as follows:(1)In the neighborhood rough set,a radius is generally appointed to restrain the similarities between the samples,it follows that the neighborhood information granulation can be realized.If the radius is too great,then the samples in different classes may fall into the same neighborhood,and they may result in imprecise or inconsistent information.To alleviate such problem,the strategy of pseudo-label neighborhood has been proposed.Nevertheless,in both traditional neighborhood and pseudo-label neighborhood,the similarities of the samples are only measured by the distances between them,while the structural relationship of neighborhoods related to different samples contained in one neighborhood information granule is ignored.In view of this,through introducing the measure for neighborhood distance,the mechanism of co-occurrence neighborhood information granulation is proposed,based on such mechanism,co-occurrence neighborhood rough set model and pseudo-label co-occurrence neighborhood rough set model are constructed,then the heuristic approximate quality attribute reduction algorithm is employed to obtain the corresponding reducts.The experimental results demonstrate that compared with the reduct based on the neighborhood relation and pseudo-label neighborhood relation,the reduct based on co-occurrence neighborhood may provide the higher reduction rate while the classification accuracy will not decrease.(2)In practical applications,noise data may exist in the data set,if the attribute reduction is directly performed on these noise data,it will often produce various adverse effects.Therefore,we divide it into the noise data filtering stage and the attribute reduction stage.Firstly,in the noise data filtering stage,for each sample,we first consider the density of its neighbors,using the neighborhood mechanism combine the classification strategy,and a degree noise filtering method is proposed.Secondly,in the attribute reduction stage,we can use randomized reduction to produce different attribute subsets,and ensemble them,an ensemble classification method based on conditional entropy randomized reduction is proposed.Based on the traditional heuristic algorithm,this method relaxed the constraints.In this way,multiple different attribute reduction subsets can be obtained,then use the results of these attribute subsets on the base classifier for voting to obtain the final classification result.The experimental results show that degree noise filtering method can reduce the impact of noise data and improve the classification performance of classifier on noise data.Ensemble classification method under attribute reduction not only can improve the classification accuracy effectively,but also can maintain a good classification stability.
Keywords/Search Tags:Attribute reduction, Ensemble learning, Rough set, Co-occurrence neighborhood, Classification
PDF Full Text Request
Related items