Font Size: a A A

Incomplete Data Ensemble Classification Using Imputation- Revision Framework With Local Neighborhood Information

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WuFull Text:PDF
GTID:2428330620465549Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Incomplete data classification is one of the most important branches of Machine Learning(ML).With the rapid development of sensor technology,information technology and other science and technologies,the acquisition of data is becoming increasingly diverse which brings great opportunities for ML.However,in practice,the loss of information is inevitable due to various reasons such as damage of the storage equipment,failed pixels,limited capacity of data acquisition equipment,unanswered questions in surveys and so on.Since most existing data analysis methods are designed for complete data,the existence of missing values(MVs)result in the inapplicability of most existing data analysis methods.Although there are a few algorithms can handle MVs directly,the classification performance will be greatly decreased when the original dataset contains a large number of MVs.Therefore,the research on incomplete data classification has gradually become a hot topic.Missing value imputation(MVI)is a mainstream method to deal with incomplete data.In recent years,various MVI methods(including single imputation and multiple imputation)have been utilized to recover the MVs before classification,and then use traditional classification algorithms to classify the missing data.Most of the existing MVI methods utilize the relevant techniques in the field of statistics and ML to replace the missing attributes with plausible values.And a large number of studies show that each of them has its own advantages in some specific scenarios.Therefore,it is of great significance to combine the superiorities of different MVI methods to improve the classification performance of incomplete data.In addition,the ways which consider sample spatial distribution information in missing value imputation are seldom.In view of this,this paper studies how to improve the imputation efficiency and classification performance of incomplete data by using the spatial local neighborhood information of samples.Firstly,the difficulties and main problems of incomplete data classification are introduced;Then,this paper introduces the existing traditional incomplete data classification methods and analyzes their core ideas briefly;According to the aforementioned ideas and difficulties,this paper mainly studies from two aspects:1.At first,this paper proposes a new framework(CCA-IR)which can be widely used in many existing imputation methods to enhance the imputation effect of the existing methods.It is composed of three modules,which are called pre-filling,spatial neighborhood information excavating and modify the results of pre-filling separately.The main idea of this new framework is to cover the complete datasets which are pre-filled by exsisting MVI methods and then excavate the information of the samples in the spatial distribution furtherly.The pre-filling results are revised by the local neighborhood information of the sample in the spatial distribution,so as to improve the imputation effect of incomplete data.2.CCA-IR is proposed to improve the imputation effect of the existing MVI methods which consider to revise the result of one MVI method.This paper further proposed two methods to classify the incomplete data based on the CCA-IR result: a)A weighted imputationrevision framework based on accuracy(WIRFA)and b)Ensemble classification using imputation-revision framework with loacal spatial neighborhood information(E-IRSNI).The former revises MVs by weighting a series of accuracies obtained from several complete datasets which are pre-filled by various MVI methods;The later excavates the local neighborhood information(LNI)of samples and revises MVs based on the LNI,then conducts ensemble learning on the obtained datasets to further improve the classification accuracy.
Keywords/Search Tags:Data Mining, Incomplete Data, Neighborhood Information, Missing Value Imputation, Ensemble Learning
PDF Full Text Request
Related items