Font Size: a A A

Research And Application Of Incomplete Data Imputation Algorithm

Posted on:2018-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiuFull Text:PDF
GTID:2348330536460863Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous and deepening development of the Internet and artificial intelligence,data is outbreaking in the exponential stance.However,due to various reasons,there are a lot of incomplete data in it,which directly affects the further data analysis and mining,resulting in it can not maximize its value.So the study of incomplete data imputation is of great significance.The traditional one-time imputation methods fill with all data,which not only improves the calculation,but also not considers the correlation between data.And most of the algorithms can not directly extract features from the incomplete data.For the iterative imputation algorithms,most are convergence slow,high precision or having other issues.In view of the above problems,the paper firstly proposes an incomplete data imputation algorithm based on deep belief networks.Firstly,the features of the incomplete data are extracted directly by using the denoising deep belief networks,which makes the feature more robust.And then use these features for clustering,in each cluster using the co-occurrence matrix and partial-distance strategy on the relevant data for decision score,and finally converted into the weighted score to fill.Then an incomplete data imputation algorithm based on multi-kernel estimation is proposed.It is an iterative algorithm,which constructs kernel functions for discrete attributes and constructs estimators of kernel functions for continuous attributes.And the multi-kernel estimator of the mixed attribute is obtained.Then the multi-kernel estimator of the mixed attribute is obtained.In order to improve the convergence rate of the algorithm,the partial-distance strategy is used to pre-imputation the missing values.Finally,the kernel estimator is used to iterate imputation for the missing values.At the end of this paper,the algorithm is adapted and optimized,which is applied to the imputation and specification of US import and export trade data.Through validation analysis,the imputation accuracy rate is up to 85%.From the experimental results,it can be concluded that the proposed algorithm can improve the imputation accuracy while guaranteeing the convergence speed.It meets the requirements of academic research and industrial application standards.Has certain theory and application value.
Keywords/Search Tags:Incomplete Data, Deep Belief Networks, Kernel Function, Data Imputation
PDF Full Text Request
Related items