Font Size: a A A

Study On Filling Of Random Missing Data Based On Adversarial Generation Network And Its Effect

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:J J WuFull Text:PDF
GTID:2427330626954370Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Missing data is an inevitable problem in statistical analysis.The study of outlier processing and it are two parts of data preprocessing.Missing data will not only lead to the reduction of sample information,but also make many statistical learning methods unusable.The results of statistical analysis depend on the quality of data,if the missing data can not be properly processed,the final analysis results are difficult to be representative.For the missing data,there are two main operations: deletion and filling.Because deletion will reduce the sample information,the filling strategy is generally adopted.With the arrival of the era of big data,the increase of data dimension also brings the increase of missing data.How to fill the missing data accurately and quickly has become an urgent problem.In recent years,GAN(adversarial generative network)has been studied more and more in the field of deep learning,and it has its own unique features in sample generation.Based on domestic and foreign research,based on the similarity of missing data filling and image restoration,this paper applies the theoretical framework of GAN to the problem of missing data filling,designs a new network structure suitable for the problem,and compares the method with multiple imputation,missing forest method and EM method to analyze the applicability of various methods.In the simulation section,use the computer generated random numbers with complex distribution.According to the combination of different observation numbers,variable numbers and missing proportion,the above three methods and GAN methods are used to fill the same missing data set repeatedly.The filling precision,filling effect and filling speed are compared.Finally,the conclusion is drawn: GAN method filling is often better than the above three methods under the same situation.In the empirical analysis section,we use the above methods to fill in missing data from Canadian weather data,and finally reach a conclusion consistent with the simulation study.Finally,this paper summarizes the universality of missing data and the applicability of different filling methods,and puts forward the following research suggestions on the shortcomings of this paper.
Keywords/Search Tags:missing value, GAN, multiple imputation method, missing forest method, EM algorithm
PDF Full Text Request
Related items