Font Size: a A A

A Tensor-based Approach For Unified Representation And Dimensionality Reduction Of Big Data

Posted on:2017-09-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:L W KuangFull Text:PDF
GTID:1318330485450830Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
There are two fundamental approaches for the study of big data, namely, unified representation and dimensionality reduction. However, during the past decades, no concise model has been presented to efficiently represent the unstructured, semi-structured, and structured data. In addition, the existence of inconsistence, redundant and noise data has imposed an unprecedented burden on these processing algorithms of big data. There exist great challenges for big data algorithms in terms of computational efficiency and accuracy. Therefore, there is an urgent need to propose a unified model to represent the heterogeneous data, and design efficient algorithms to extract high-quality core data. After deeply analyzing the four characteristics of big data, namely volume, variety, velocity, value, a tensor-based approach is proposed to represent big data as a unified tensor model. Based on this model, three methods are presented including incremental dimensionality reduction, distributed dimensionality reduction, and secure dimensionality reduction approach. The main contributions of this thesis are summarized as follows.Firstly, a tensor-based approach is proposed for unified representation of big data. In this thesis, a unified tensor model is proposed to address this challenge. To solve the problem of feature conflict, a tensor-based fusion method is explored, which can stack all the features of the heterogeneous data to a high-order tensor space. Additionally, in order to meet the requirements of the big data applications, this thesis presents an integration framework including five functionally complementary processes.Secondly, this thesis presents an incremental approach for dimensionality reduction of big data. There are two distinguished features during big data processing, namely large volume and huge middle computational results, which make traditional algorithms ineffective. An incremental method is proposed to dynamically update the truncated orthogonal bases using the projection results of the additional columns. A core tensor equivalent theorem is proven to address the problem of order inconsistency, and a recursive algorithm is designed for dimensionality reduction of big data. Experimental results demonstrate that the proposed method is competitive.Thirdly, a distributed approach is investigated for dimensionality reduction of big data. The approach includes a distributed algorithm, a distributed computing environment, and a tensor block partitioning approach. A chunk tensor method is presented to fuse the unstructured, semi-structured and structured data as a unified model in which all characteristics of the heterogeneous data are appropriately arranged along the tensor orders. A Lanczos based High Order Singular Value Decomposition algorithm is proposed to decompose the unified tensor model. A four-objective optimization model is proposed to distribute the tensor blocks in a near optimal way to computing devices.Finally, secure approaches are explored for dimensionality reduction of big data. Based on the partial and fully homomorphic encryption scheme, the heterogeneous data are represented as low-order sub-tensors which are then encrypted using the encryption mechanism. The secure method using the partial homomorphic encryption scheme consists of secure bidiagonal algorithm, secure singular value decomposition algorithm, as well as mode-n product algorithm. A unified high-order cipher tensor model is constructed by collecting all the cipher sub-tensors and embedding them to a base tensor space in the secure method using the fully homomorphic encryption scheme. The cipher tensor is decomposed through a proposed secure algorithm, in which the square root operations are eliminated during the Lanczos procedure, while the division operations are transferred to clients. Theoretical analyses and experimental results demonstrate that the two methods are different from both computation efficiency and security level.
Keywords/Search Tags:Big Data, Tensor Model, Unified Representation Approach, Dimensionality Reduction, Incremental Computing, Distributed Computing, Homomorphic Encryption Scheme
PDF Full Text Request
Related items