Font Size: a A A

Research On Missing Data Imputation Based On Tensor Decomposition

Posted on:2015-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhuFull Text:PDF
GTID:2268330428465099Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the extensive application of computer technology in various fields, thedimension of data to be processed is becoming higher and higher. In the field ofnetwork traffic analysis and chemical data analysis, the sampling data is oftenpartially missing in high dimensition pattern. To analysis and process the highdimension data with missing values, we usually have to impute the missing part.Traditional methods to handle missing values mostly focus on data in vector or matrixform. For high dimension data, they are often expanded into vector or matrix forms,which destroys the data structure and leads to low accuracy of imputation. Recentyear’s research combines tensor decomposition and high dimension data imputation,and proposes a series of algorithms. This paper conducts research on these algorithms,puts forward three new algorithms, and makes a comparison between them bysimulation experiments. For the problem of missing data imputation based on tensordecomposition, the detailed work done by this paper are as follows:1、Focusing on the problem of dense tensor imputation in small scale, we proposean algorithm called Tucker-ALS. The classical PARAFAC-ALS algorithm adopts CPdecompostion model in tensor decomposition, and use the expectation maximizationmethod to solve the least square problem, but the speed is low, and the imputationaccuracy is not high enough. The Tucker-ALS algorithm proposed in this paper issimilar to it, but picks the Tucker decompositon model instead, which brings fasterspeed.2、Focusing on the problem of dense tensor imputation in small and medium scale,we present an algorithm called Tucker-WOPT. The good PARAFAC-LM algorithmsolves the least squares problem by computing the second order partial derivative,while CP-WOPT algorithm does so by computing the first order partial derivative,and both of them pick CP decompostion model from tensor decomposition. TheTucker-WOPT algorithm proposed in this paper chooses Tucker decompostion modelinstead, solves the least squares problem by computing the first order partialderivative, and deduces a fast calculation formula to calculate it. Tucker-WOPTalgorithm can achieve higher accuracy than that of the two algorithms, and with theincreasing percentage of missing values, this advantage tends to be more apparent.3、Focusing on the problem of sparse tensor imputation in large scale, we putforward an algorithm called Tucker-SOPT. This algorithm’s idea is similar to that of CP-WOPT algorithm of sparse tensor, but we choose the Tucker decomposition modelinstead. This paper deduces a calculation method of n-mode product of a sparse tensorand a matrix and deduces a method to calculate the first order partial derivative ofTucker decomposition model’s factor matrices of sparse tensor. The Tucker-SOPTalgorithm has almost the same speed with CP-WOPT algorithm of sparse tensor, butits imputation accuracy is higher.
Keywords/Search Tags:tensor decomposition, missing data, expectation maximization method, gradient optimization, high dimension data imputation
PDF Full Text Request
Related items