With the rapid development of information technology,a large number of data are produced every day in the real world,through which valuable information can be obtained and used.High-quality data is the prerequisite for efficient and accurate data analysis.However,high-quality data is difficult to obtain directly,and the problem of missing data has a great impact on data quality.Aiming at the problem of missing data,efficient data imputation method can ensure the integrity of data,improve data quality and provide data support for data analysis.At present,there are two difficulties in the research of missing data imputation: one is that most data sets are incomplete data sets without label information,and the potential category information of data is ignored in the process of data imputation,in order to make full use of the existing data,some dataimputation methods train the missing data into the imputation model,which will result in the performance degradation of the imputation model.Therefore,missing data imputation method is very important to obtain high-quality data,so it is imperative to design and develop an efficient data imputation system.To solve the problems in the field of missing data imputation,this thesis proposes the following solutions: firstly,for the unlabeled data set,in order to mine the potential class information of the data and apply it,in this thesis,a method of data set partition based on pseudo-label is proposed,which is used to divide the data set to be filled into several subsets according to the potential category of the data.Second,this thesis proposes a Generative Adversarial Denoise Imputation Network(GADIN)to fill missing data in each subset.GADIN fuses the method of noise reduction to reduce the influence of missing data on constructing imputation model.Finally,the experimental results show the validity of the data imputation method.Based on the detailed requirement analysis,this thesis designs and realizes a data imputation system based on generative adversarial denoise network.The system has five functional modules: user login,user management,data pre-processing,model building and data imputation.The user login module is used to verify the user login information,the user management module is used to maintain the user information,the data pre-processing module is used to clean the data and make the data meet the needs of the imputation method The model building module is used to train the data imputation model,and the data imputation module uses the trained imputation model to fill up the incomplete data set.This system is a general data imputation system,which can provide data imputation function for every field.In order to test the function of this system,this thesis takes the data imputation system in financial field as an example. |