Font Size: a A A

Incomplete Data Ensemble Classification Based-on Tolerance Relationship

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:M L YangFull Text:PDF
GTID:2518306542963369Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of internet technology makes classification becomes an important topic in various fields in recent years.At the same time,a large number of data which generated in daily life provides a foundation for the research of machine learning,data mining and other fields.However,there may be a variety of factors that lead to data missing phenomenon,which results in the appearance of incomplete data.Since most machine learning classification methods are designed for complete data,they are not suitable for data with missing values,which poses the difficulty for classifying incomplete data.Although there are some algorithms are designed for incomplete data,the performance of these algorithms deteriorates dramatically as the number of missing values increases.Therefore,the research of incomplete data classification algorithm has gradually aroused wide concern.Ensemble learning is an important component of the methods for incomplete data classification.However,the existing ensemble-based incomplete classification methods take insufficient account of data redundancy,which results in the degradation of classification performance.Therefore,it is significant to find a method to eliminate redundant attributes effectively.Rough set theory considered that the different effects of each attribute on the final classification result and then proposed an attribute reduction method to promote the classification performance.The method can keep the classification ability of the original data set unchanged.In view of this,this dissertation studies how to apply the rough set to classification of incomplete to promote the performance.Firstly,the generation way of incomplete data and the significance of classification of incomplete data are introduced.Then,the existing classification methods of incomplete data are analyzed in detail.The main studies from the following two aspects:1.At first,this dissertation puts forward an incomplete data ensemble classification algorithm based on tolerance relation called REIC.The mechanism of the method is to eliminate the redundant attributes of the dataset by using the concept of tolerance relation in rough set,and obtain a single reduction by extracting important attributes,then the multiple reducts are obtained by iteratively deleting the attributes except core in the single reduction to construct different attribute subspaces.Different attribute subspaces can improve the diversity of the base classifier,and then promote the performance of ensemble classification.2.REIC improves the performance of incomplete data ensemble classification,however,it still has the problem in the face of complex phenomenon such as the method cannot make a prediction in some sample which is missing at all reducts attributes.In order to further improve the prediction rate and promote the classification performance,this dissertation introduces Missing Pattern into the framework of reduction ensemble classification,and proposes an incomplete dataset classification method called RMPF,which combines the missing pattern set and the reducts with a fusion strategy.This method detects the missing pattern set of the incomplete dataset firstly,and then optimizes the missing pattern set with reducts to reduce the redundant attributes which plays a negative role to classification result.Then,an ensemble classifier is constructed.This method extends the predictability of samples and promote the classification performance.
Keywords/Search Tags:Incomplete data, Rough set, Compatible relation, Attribute reduction, Missing pattern, Ensemble learning
PDF Full Text Request
Related items