Font Size: a A A

Low Rank Approximation Of High Dimensional Matrix Based On Sampling And Its Application

Posted on:2022-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:X X RenFull Text:PDF
GTID:2518306767999529Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,most of the massive data exist in the form of high-dimensional matrix.How to reduce the dimension of high-dimensional matrix has become a hot topic in machine learning.Sampling technique has been proved to be an effective method to reduce the dimension and computational complexity of high-dimensional data,but the errors generated by different sampling and matrix reconstruction methods are quite different in the process of dimensionality reduction.From the sampling point of view,this paper studies the method and error measure of low-rank approximation of high-dimensional matrix,focusing on improving the accuracy of low-rank approximation while reducing the computational complexity.The main work includes the following aspects:Firstly,Nystr(?)m method is a relatively effective low-rank approximation technique for large-scale data sets,which aims to extract some columns from the original data matrix to reconstruct the low-rank approximation matrix of the original data matrix.Considering that different sampling methods have great influence on the accuracy of matrix reconstruction,a combination of unequal probability sampling Nystr(?)m method and stochastic singular value decomposition(SVD)method is proposed to improve the low-rank approximation accuracy and reduce the computational complexity in matrix reconstruction.The results show that the proposed Nystr(?)m method has high accuracy in matrix reconstruction and can greatly reduce the computational complexity.Secondly,in high-dimensional big data matrix analysis,it is a common method to approximate the original data matrix with a small number of major components.These major components are linear combinations of matrix rows and columns,and it is difficult to explain the original characteristics of the data.Proposed to differ is sampling and adaptive sampling is suitable for the CUR sampling method of matrix decomposition,and the random sampling method and matrix singular value decomposition(SVD)method,combining the matrix C and R obtained by sampling randomly SVD decomposition,in the control of computational complexity and improve the accuracy of low rank approximation reconstruction.The results show that the CUR matrix decomposition method based on the combination of unequal probability adaptive sampling and stochastic SVD decomposition has high accuracy and stability in low-rank approximation of matrices.Finally,the Nystr(?)m method based on unequal probability sampling and random SVD decomposition is extended to spectral clustering,and empirical analysis is made by using the financial ratio data of listed companies.A feature extraction method based on the Nystr(?)m method of unequal-probability sampling is proposed.By extracting the main feature indexes that affect the performance of listed companies,the original data information can be retained as much as possible while reducing the data dimension and the complexity of data calculation.On the basis of selecting feature variables,spectral clustering analysis is carried out for listed companies.The results show that the sample ratio of 20% for feature extraction of the original data index can uniformly include 10 categories of first-level indicators of the original data,indicating that the results of feature extraction have good representativeness.The analysis of spectral clustering results shows that the 73 listed companies selected in this paper are divided into 4 categories,and the value R~2=0.72 representing the clustering effect is obtained through the evaluation criteria of clustering effect,indicating that the clustering has a good effect.The CUR matrix decomposition based on unequal probability sampling and random SVD decomposition is extended to preference feature extraction,and empirical test is performed using user-movie rating data.The preference feature extraction method is based on raw data sampling,which has high explanatory value and clear meaning.The results show that the preference feature extraction algorithm based on CUR matrix has better performance,and the extracted user or product features can reflect the original data features well.With the increase of the number of sampling columns and rows,the accuracy rate of preference feature extraction increases and the compression rate decreases.The accuracy of preference feature extraction method based on CUR matrix decomposition is much higher than that based on SVD decomposition.
Keywords/Search Tags:Nystr(?)m method, CUR matrix decomposition method, Unequal probability sampling, Unequal probability adaptive sampling, Random SVD decomposition, Relative error, Computational complexity
PDF Full Text Request
Related items