Font Size: a A A

Research On Missing Data Recovery In Data Sets

Posted on:2016-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhuFull Text:PDF
GTID:2308330482953075Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, sensor technology and multimedia technology have been widely applied to every aspect of people’s lives, resulting in large amounts of data. These data have large scale, high dimension, and complex structure. And the process of data’s acquisition, transmission and storage may cause data loss or corruption. How to recover the original data has become a hot and difficult issue in the field of data mining, machine learning, pattern recognition and computer vision.After reading a large number of relevant literatures, we give a review of the develo pment status of the missing data recovery. The traditional method of data recovery is m ostly based on the conditions that data size of the data is small or the data dimension is relatively low. In the current era of big data, we need to consider data with larger set an d higher dimension. Thus aiming at missing data recovery for data sets with relatively la rge size and tensor with high dimension, this paper proposed the corresponding missing data recovery method respectively. This paper is organized as follows:Firstly, traditional low rank matrix with missing data recovery algorithms mostly need singular value decomposition (SVD), such as APG, FPCA, and their computational complexity are very high. In order to avoid matrix singular value decomposition, this paper proposed an algorithm based on successive over relaxation iteration algorithm (SOR) with L2 norm minimization using matrix decomposition. Through the simulation experiment, the algorithm shows that it has the same accuracy as GS. Meanwhile, it has advantage in running speed, especially in the case of large matrix size.Secondly, tensor, as a higher order (greater than third order) form of the vector (first order) and matrix (second order), can better express the essence structure of complex data. So the missing data recovery of the tensor can take full advantage of the essence structure implied by these complex data. However, most of the existing missing data recovery algorithms of tensor only simply apply a low rank matrix missing data recovery framework to tensor, dividing the tensor problem into the matrix kernel norm problem. It destroyed the structure characteristic of the tensor. Inspired by the alternative least square algorithm based on tensor PARAFAC decomposition, a gradient optimization algorithm based on tensor PARAFAC decomposition is proposed(PARAFAC-Grad). The results of the simulation show that the proposed data recovery method’s accuracy is relatively higher than the other two algorithms (Tucker-als and PARAFAC-als).
Keywords/Search Tags:Missing Data Recovery, Successive Over-Relaxation method, Tensor Deco- mposition, madient Optimization Algorithm
PDF Full Text Request
Related items