Research On Adaptive And Robust Missing Value Imputation Algorithm

Posted on:2022-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:W L Dong

Full Text:PDF

GTID:2518306557975329

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,machine learning and data mining have gradually become a hotspot research field.With the rapid development of the internet techniques,the researchers face more and more data,however,many algorithms belonging to machine learning and data mining are required to be based on the complete data sets,this pre-condition brings the practical difficulty for the users who counter with the incomplete data.There have existed lots of data imputation algorithms to fill the missing data.However,most of the existing imputation methods are specifically designed for the static data,but ignoring the data in the form of flow,as well most of them use a single model,thus,the robustness are generally poor.How to extend the traditional imputation algorithms to the online dynamic data stream and to improve the robustness of the traditional algorithms have become the important issues In this thesis,the adaptive and robust missing value imputation algorithms are studied and developed.To solve the self-adaptability problem of the traditional imputation algorithms,two strategies based on the sliding time window are proposed to alleviate the filling error caused by concept drift,one is the ordinary average strategy,and the other is the log-weighted average strategy,i.e.,gradually increasing the weight of the instance on the time axis.Combining with the proposed strategies,three imputation algorithms are adopted,namely the mean imputation(MI),KNN imputation(KNNI)and Bayesian principal component analysis imputation(BPCAI),respectively.The experimental results indicate that the effectiveness of the strategies are independent with the specific imputation technique.To improve the robustness of the traditional imputation algorithms,the idea of ensemble learning is adopted with verifying on the gene expression data.First,the Pearson correlation coefficient is used to construct the correlation space,and then the space is divided into multiple random subspaces.Next,training ELM regression model on each random subspace.Finally,the mean value of all models is calculated as the imputation value of the missing value.It has proved that the proposed scheme is good at improving the adaptability and robustness of the missing value imputation algorithms.

Keywords/Search Tags:

Missing value imputation, Data stream, Slide time window, ELM, Ensemble learning

PDF Full Text Request

Related items

1	Studies On Missing Data Imputation
2	Incomplete Data Ensemble Classification Using Imputation- Revision Framework With Local Neighborhood Information
3	Research On Data Imputation Methods Oriented Specific Domains
4	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism
5	Nonparametric Imputation For Missing Data
6	Attribute Correlation Modeling And Missing Value Imputation Of Incomplete Data Based On Fuzzy Partition
7	The Analysis And Improvement Research Of Knn-imputation Algorithm
8	Attribute Associated Neuron Modeling And Missing Value Imputation Based On Neural Network
9	Research On Missing Value Imputation Of Incomplete Data
10	The Online Imputation Method Of Missing Value Based On KNN And Its Application In Credit Evaluation