Research On Missing Data Imputation Based On Tensor Decomposition

Posted on:2015-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zhu

Full Text:PDF

GTID:2268330428465099

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the extensive application of computer technology in various fields, thedimension of data to be processed is becoming higher and higher. In the field ofnetwork traffic analysis and chemical data analysis, the sampling data is oftenpartially missing in high dimensition pattern. To analysis and process the highdimension data with missing values, we usually have to impute the missing part.Traditional methods to handle missing values mostly focus on data in vector or matrixform. For high dimension data, they are often expanded into vector or matrix forms,which destroys the data structure and leads to low accuracy of imputation. Recentyear’s research combines tensor decomposition and high dimension data imputation,and proposes a series of algorithms. This paper conducts research on these algorithms,puts forward three new algorithms, and makes a comparison between them bysimulation experiments. For the problem of missing data imputation based on tensordecomposition, the detailed work done by this paper are as follows:1、Focusing on the problem of dense tensor imputation in small scale, we proposean algorithm called Tucker-ALS. The classical PARAFAC-ALS algorithm adopts CPdecompostion model in tensor decomposition, and use the expectation maximizationmethod to solve the least square problem, but the speed is low, and the imputationaccuracy is not high enough. The Tucker-ALS algorithm proposed in this paper issimilar to it, but picks the Tucker decompositon model instead, which brings fasterspeed.2、Focusing on the problem of dense tensor imputation in small and medium scale,we present an algorithm called Tucker-WOPT. The good PARAFAC-LM algorithmsolves the least squares problem by computing the second order partial derivative,while CP-WOPT algorithm does so by computing the first order partial derivative,and both of them pick CP decompostion model from tensor decomposition. TheTucker-WOPT algorithm proposed in this paper chooses Tucker decompostion modelinstead, solves the least squares problem by computing the first order partialderivative, and deduces a fast calculation formula to calculate it. Tucker-WOPTalgorithm can achieve higher accuracy than that of the two algorithms, and with theincreasing percentage of missing values, this advantage tends to be more apparent.3、Focusing on the problem of sparse tensor imputation in large scale, we putforward an algorithm called Tucker-SOPT. This algorithm’s idea is similar to that of CP-WOPT algorithm of sparse tensor, but we choose the Tucker decomposition modelinstead. This paper deduces a calculation method of n-mode product of a sparse tensorand a matrix and deduces a method to calculate the first order partial derivative ofTucker decomposition model’s factor matrices of sparse tensor. The Tucker-SOPTalgorithm has almost the same speed with CP-WOPT algorithm of sparse tensor, butits imputation accuracy is higher.

Keywords/Search Tags:

tensor decomposition, missing data, expectation maximization method, gradient optimization, high dimension data imputation

PDF Full Text Request

Related items

1	An Expectation Maximization Application For Decision Tree Classifiers On Datasets With Missing Values
2	Studies On Missing Data Imputation
3	Research And Implementation Of Imputation Method For Missing Data In The Trash Pickup Logistics Mangagement System
4	The Online Imputation Method Of Missing Value Based On KNN And Its Application In Credit Evaluation
5	Research On Missing Data Imputation Method Based On Generative Adversarial Network
6	Research On Missing Value Imputation Method Based On Mixed Information System
7	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism
8	Research On Theory And Applications Of Sparsity Reconstruction Of High-Dimension Data From Quantized Measurements
9	Nonparametric Imputation For Missing Data
10	Research On Passenger Transport Data Quality Detection And Missing Data Imputation