Font Size: a A A

Researches On Imputation And Classification Of Incomplete Data Based On Variables For Missing Values

Posted on:2021-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:X WuFull Text:PDF
GTID:2428330611451406Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The explosive growth of data brings unprecedented opportunities and challenges for human society,and the effective way to mine the potential value implied in data has become a crucial topic.Classification,as a common way of data analysis,can provide detailed insight and induction into the inherent law of data.However,real-world datasets are susceptible to missing values,which increases the difficulty of data mining and lowers the reliability of inference results.Under this background,the paper involves a progressive two-stage work on missing value imputation and incomplete data classification,which can be specified as follows.(1)For the problem of missing value imputation,we propose a tracking-removed autoencoder by redesigning the input structure of hidden neurons in a dynamic way based on the autoencoder.Moreover,a scheme that treats missing values as variables and allows them to participate in network training is designed considering the data incompleteness.The imputation is completed at the end of the training process.The proposed method makes full use of present values in the incomplete dataset and builds the correlation of attributes by the tracking-removed autoencoder for the effective imputation in complicated missing patterns.Experiments validate that the proposed method has the ideal performance of imputation.(2)For the problem of incomplete data classification,we first build a regression model by the tracking-removed autoencoder to mine the attribute interdependencies within the data,then reorganize neurons in the output layer and construct a multi-task learning model to achieve imputation and classification simultaneously.In model training and prediction periods,missing values are treated as variables and updated dynamically accompanying with model parameters considering the incomplete model input.The dynamic optimization of missing values promotes the model to match the regression and classification structures implied in incomplete data.The experiments on UCI data sets validate the effectiveness of the proposed method.This paper makes an in-depth discussion on the incomplete data from the above two aspects,and thus proposes effective solutions.In the era of big data where data quality is difficult to be guaranteed,the research involves in the paper has important practical significance.
Keywords/Search Tags:Incomplete Data, Imputation of Missing Values, Classification, Tracking-removed Autoencoder, Coupling Modeling
PDF Full Text Request
Related items