Font Size: a A A

Research On Missing Data Imputation And Prediction Methods Based On Generative Adversarial Nets

Posted on:2020-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330590460643Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The era of information has produced a huge amount of data,which contains a lot of valuable information,but various problems concerning data quality tend to rise frequently.For part of the data is often missing during the process of data acquisition,recording and storage,these incomplete data will reduce the value of data utilization,which then makes it difficult to do data mining subsequently as well as affecting the quality of data which is used for decision making.Therefore,how to deal with incomplete data effectively and conduct high-quality decision-making based on these data has great empirical significance.In recent years,Generative Adversarial Nets has been a hot research direction of deep learning,and considering its ability to fit the distribution of high-dimensional data.Therefore,this thesis uses this to learn the mapping of missing data to complete data distribution.This thesis mainly analyzes and studies the methods of imputing and predicting incomplete data sets of high-dimension and high missing-rate using Generative Adversarial Nets and other imputing algorithms.The main work and innovations are as follows:(1)The applicable conditions and limitations of various commonly used algorithms for processing incomplete data are studied.Firstly,the causes,mechanisms and patterns of missing for incomplete data are analyzed.Secondly,the data missing problem under large sample size is studied.Then,several data imputing methods relying on deep learning skills are analyzed.As a result of analysis,we found that most of the imputing algorithms do not use label data effectively and have difficulty imputing incomplete data sets of large size and high missing rate.Based on these research,this thesis then proposes an idea of using Generative Adversarial Nets to solve the above problems.(2)A Missing Data Imputation Generative Adversarial Nets(MIGAN)model is proposed.MIGAN can impute incomplete data sets effectively,and the auxiliary prediction network by collaborative training makes the imputation results have strong correlation with the labels.This thesis compares the experimental results on three data sets from UCI as well as the mnist data sets.It shows that MIGAN has a good performance on imputing incomplete data sets and predicting based on the imputation results under different circumstances,especially when the incomplete data sets have high dimension and high missing rate.In addition,from pictures generated on the mnist data set,The results from MIGAN is easier to recognize by human beings.(3)A semi-supervised Missing Data Imputation Generative Adversarial Nets model(semi-MIGAN)is proposed to impute the incomplete data set where some of the labelsare missing.The proposed semi-MIGAN model is a refined version of MIGAN and it can solve the special data missing problem of label missing in incomplete data sets.Experiments show that semi-MIGAN has better imputing performance than other algorithms.
Keywords/Search Tags:MIGAN, incomplete data, label, imputation, prediction
PDF Full Text Request
Related items