Font Size: a A A

Research On High Dimensional Data Analysis Method Based On Dimensionality Reduction And Feature Selection

Posted on:2020-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2480306308970569Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
High-dimensional data contains a lot of information,but it also includes the amount of redundant information.Studying high-dimensional data often leads to the curse of dimensionality.High-dimensional data in real life would inevitably be polluted by noise or outliers,resulting in a decline in data quality,which has adversely affected the analysis of high-dimensional data.In this context,this thesis studies the low-rank component recovery algorithm for high-dimensional data.The main research contents and contributions are as follows:In order to improve the data recovery effect of low-rank matrix,this thesis optimizes the RPCA-L21 algorithm based on robust principal component analysis algorithm.Because the l2,1 norm can describe the structure sparse,it is used to replace the l1 norm in the original optimization problem,so that the sparse part of the optimization problem is more similar to the real-life noise distribution.The random singular value decomposition method is used to solve the optimization problem,which reduces the complexity of the algorithm.The experiments on the simulated data set and the real data set show that the low rank matrix recovery error of the RPCA-L21 algorithm is reduced to 10-7.It is excellent in image denoising,image classification and other scenes,particularly the classification accuracy rate in handwritten digit recognition scene is increased by 86.36%.In order to improve the recovery effect of low-rank tensor data,this thesis optimizes the RTRPCA algorithm based on the tensor robust principal component analysis algorithm.The algorithm is based on the low-rank tensor recovery theory and can effectively process tensor data.In order to effectively deal with the pollution of outliers,the tensor l2,1 norm is added to the optimization problem.In addition,for partial contamination,the pollution detection method based on the Rank-Ordered Logarithmic Difference value is added to the algorithm.Only the data points are determined to be contaminated would be recovered,which guarantees the quality of the low rank tensor after recovery.The experiments of the algorithm on the simulated data set and the real data set show that the low-rank tensor recovery error of the RTRPCA algorithm is reduced to 10-10.In the scene of driver's dangerous behavior classification,the algorithm can effectively deal with the interference of sensor signal data in real life,and the classification accuracy rate can be increased by 9.23%.
Keywords/Search Tags:high dimensional data, low-rank matrix recovery, low-rank tensor recovery
PDF Full Text Request
Related items