Font Size: a A A

Several Studies To Improve Deep Data Imputation

Posted on:2024-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2568307055468894Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Missing data is a common data anomaly in the field of data analysis,which hinders the analysis and utilization of data.At present,the most effective way to solve the missing data is to perform data imputation on the missing data,using the observed values to predict the missing values to obtain complete data and help subsequent data analysis.With the development of deep neural networks,many deep data imputation models have been proposed in recent years,and have achieved performance beyond traditional data imputation methods.However,the existing deep data imputation models have the following problems: 1.Since data inpainting is a typical ill-posed problem,the solution space for predicting complete data from missing data is too large,so it is difficult to obtain satisfactory optimization results.2.The current deep data imputation model usually does not explicitly use the category information hidden in the data when performing data imputation,which leads to a low utilization rate of data features;3.There is currently a lack of general data augment methods that can be directly used in deep imputation models.To address the first problem,this paper designs a new information-guided deep image inpainting model for image data inpainting tasks.Specifically,this method first proposes a model that is more suitable for guided image inpainting based on observation and analysis.The hybrid guidance information of a variety of image abstract information is hybrid and an unsupervised hybrid information extractor is designed to extract the hybrid guidance information of the image without any external prior.Then,the designed dual-stream progressive interactive inpainting network is used to reconstruct the hybrid guidance information and image features of the damaged image in a progressive interactive way.A large number of comparative experiments and ablation experiments show the excellent performance of the method and the effectiveness of the design.To address the second problem,this paper proposes an unsupervised missing data imputation method based on the generative adversarial framework,which uses the latent category information of missing data to enhance the imputation ability of the imputation model.Specifically,the method first pre-trains the imputation model on the low-missing-rate subset of missing data,and obtains the latent category information of the data through unsupervised clustering on the imputation results of the subset;A simple auxiliary classifier is trained with the pseudo-labels obtained from the clustering;finally,the parameters of the classifier are fixed and absorbed into the imputation model to help the generator generate higher-quality imputation results.The experimental results show that the imputation performance of the method in this paper is better than other baseline methods.To address the third problem,this paper proposes a data augmentation method called missing augmentation,which can be directly applied to many existing deep data imputation frameworks to further improve the performance of these models.Specifically,the method uses the output of the imputation model to dynamically expand the trainable missing samples as augmented samples in each iteration,and then constrains these augmented samples through a simple reconstruction loss.The above reconstruction loss plus the original loss of the imputation model constitutes the final optimization goal of the model.Experimental results show that this method can improve the performance of many existing deep data imputation models on various datasets,and the method is robust to model structures and data types.
Keywords/Search Tags:incomplete data, data imputation, image data, tabular data, deep learning, generative model
PDF Full Text Request
Related items