Design For Missing Value Imputation Scheme Based On Deep Neural Network Generative Model

Posted on:2024-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Wang

Full Text:PDF

GTID:2530307088951039

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the advent of the data era,the acquisition of massive data is no longer a problem,but how to improve the quality of obtained data has become more and more important.High-quality data is the guarantee for training high-quality models.In the process of data acquisition,subjective or objective reasons will always lead to a certain degree of missing data,and this missing data greatly affects the quality of the data itself.At present,common processing methods for missing data include direct deletion method,imputation method based on probability model,and traditional machine learning imputation method.With the development of artificial intelligence technology,scholars began to try to use generative models based on deep neural networks to deal with the lack of data.Variational autoencoder(VAE),along with other generative models,have been shown to efficiently and accurately capture the underlying structure of large volumes of complex high-dimensional data.However,existing VAE still cannot directly handle datasets with missing data.In this paper,we propose a generative model for VAE based on a multi-head self-attention mechanism.Imputation of missing values can be performed on incomplete data with attributes of mixed data types.The generative model proposed in this paper maps the values of categorical data types and continuous data types to high-dimensional space respectively.Map a single value into a high-dimensional vector,and use the multi-head self-attention module in the high-dimensional space to obtain the relationship between vectors.In the decoder of the model,this relationship is used to restore the original value to complete the process estimation of the missing part of the data.In order to verify the effectiveness of the generative model,this paper selects four classification datasets with different data type attributes to simulate missing data.Missing data were simulated using traditional imputation methods and the generative model imputation method proposed in this paper in the four datasets.Comparative experimental results show that the imputation performance of the proposed method is better than that of existing methods on most datasets.Further experiments show that the generative model proposed in this paper outperforms the partially supervised classification model in the label prediction task of binary classification.

Keywords/Search Tags:

Missing value imputation, Variational auto-encoder, Self-Attention Mechanism, Missing completely at random

PDF Full Text Request

Related items

1	The Data Analysis Of GEE And QIF Used In The Asthma Data With Missing Data
2	Research On Forecasting Method Of Short-term Heavy Precipitation And Hail Based On Physical Field Data
3	The Comparison Of Nine Common Imputation Methods For Missing Values
4	Expectation Estimator In Missing Data
5	Nonparametric Imputation Under Random Missing Model
6	Response Missing At Random Variable Coefficients Of Statistical Inference
7	Empirical Likelihood Inferences For Three Classes Of Statistical Models With Missing Data
8	Statistical Inference For Varying-coefficient Partially Linear Models With Missing Responses
9	Comparison And Empirical Analysis Of Imputation Methods For Missing Data
10	Research On Time Series Modeling And Imputation Based On The Variational Auto-Encoders