Font Size: a A A

Research On Missing Data Imputation Method Based On Generative Adversarial Network

Posted on:2022-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y B XuFull Text:PDF
GTID:2518306731997879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The imputation of missing data is an important part of data preprocessing.Missing data imputation is of great significance to data mining,artificial intelligence and other data-based technologies.With the development of 5G communication and Internet of Things technology,the total amount of data collected and stored in various industries has grown rapidly.At the same time the problems of data quality have become more and more prominent.In these data quality problems,data missing is the main reason of data Integrity issues,attracting more and more attention.In the research of missing data imputation,the statistical method has been difficult to adapt to the new characteristics of high dimension and large scale.Research of imputation methods based on machine learning has become a hot spot.In order to accomplish the imputation task more accurately,this paper analyzes and optimizes the existing machine learning missing imputation model,and proposes three missing data imputation models.The main work of this paper is as follows:1.Aiming at the low accuracy of variational encoder and the uncontrollability of generative adversarial network in imputation.A missing imputation model based on the variational autoencoder generative adversarial network is proposed.In this model,the accuracy of estimates is improved by the adversarial training mode of the discriminator.The function of discriminator is improved.In order to adapt to the task of imputation,a discrimination of sample-to-sample has been used instead of element-to-element.The experimental results show that the missing data imputation model based on the variational autoencoder generating network has more advantages in imputation of high-dimensional and large-scale data,reducing the costs of time and computing,improving the accuracy of missing imputation.2.In order to improve the accuracy of missing using the similarity between data samples,a missing imputation model based on the conditional variational autoencoder is proposed.On the basis of the variational autoencoder generation confrontation network,condition variables are used to control the generation of missing estimated samples,and samples with low similarities are isolated into different spaces to reduce imputation errors.At the same time,based on this model,a multiple imputation method is proposed,which can be used in imputation of missing data with or without complete condition labels.The experimental results show that the improvement scheme of increasing condition variables can further improve the accuracy of missing imputation.3.In order to improve the accuracy of missing imputation using the correlation among attributes,a grouped variational autoencoder generative adversarial network model based on attributes' mutual information is proposed.On the basis of variational autoencoder generative adversarial network imputation model,a sparseness of model's network is carried out,according to the correlation between the attributes.Numerous connections with low correlation in neural network have been removed,in order to strengthen the impact between highly correlated data.The experimental results show that the optimized model has a better imputation effect.
Keywords/Search Tags:missing data, imputation, VAE, GAN, Neural Network
PDF Full Text Request
Related items