Font Size: a A A

Research And Implementation Of Incremental Dimensionality Reduction Methods For Big Data

Posted on:2020-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:X B YanFull Text:PDF
GTID:2438330575455710Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of information technology,the scale of data is growing exponentially,and the value of big data is attracting more and more attention.In the Internet,biological information,finance and many other fields,the amount of data is growing rapidly.The rapid growth of data volume provides valuable opportunities for the development of all walks of life,but also brings a series of challenges to the big data dimension reduction technology.In recent years,the design of efficient dimensionality reduction model to extract the core information from high-dimensional data and realize the efficient dimensionality reduction of big data has become the main content of many scholars' research.This paper on the basis of high-dimensional data study,using the Spark cluster powerful data analysis ability,the incremental design,distributed data dimension reduction system,and as the eye system,the users in the use,operation mode share three to choose from,common mode corresponding IDPCA module,common_v1 model corresponding DIDPCA module,common_v2 pattern corresponding to the SVD incremental dimension reduction module.The core idea of this paper is that in the dimensionality reduction process,only the data of the latest incremental changes are extracted for dimensionality reduction processing,and then the total dimensionality reduction result is updated with the incremental dimensionality reduction result to obtain the latest dimensionality reduction result.The advantage of using incremental processing technology is that it can make the correlation between the incremental dimension reduction result and the historical dimension reduction result,which can save a lot of computing resources and significantly enhance the efficiency of data dimension reduction.It is a very effective method in the scenario of solving the incremental change of data.The specific work of this paper is as follows:1.Proposed IDPCA dimensionality reduction algorithm,and realized IDPCA incremental dimensionality reduction module of the smart eye system based on IDPCA algorithm.This module is configured by the user,submitted for operation,and then the results are used for decision analysis.2.Proposed the DIDPCA algorithm and realized the distributed incremental dimension reduction module of the smart eye system based on DIDPCA by using Spark cluster.Module is configured by the user first,then these parameters in the cache to Redis,configuration information can be obtained through the Spark engine to carry on the task scheduling,task submitted to Spark cluster is calculated,the system will be the correlation coefficient matrix,eigenvalue and eigenvector and projection phase calculation of phase on the cluster parallel implementation,compared with PCA algorithm,improve the efficiency of dimension reduction.3.The incremental dimensionality reduction method based on singular value decomposition is proposed,and the third module of the smart eye system is realized by deploying the algorithm into the cluster.The core idea of this module is to decompose the original data set and obtain a diagonal matrix.The element values of the diagonal matrix are singular values,and all singular values are arranged from the largest to the smallest.And in general,the sum of squares of the first 10% singular value will account for more than 95% of the sum of squares of all singular values,so we can approximate the original matrix with the first k singular values.The system module based on this algorithm can effectively reduce the dimension of data.
Keywords/Search Tags:High dimensional data, The Spark cluster, Principal component analysis, Singular value decomposition, Incremental dimension reduction
PDF Full Text Request
Related items