Font Size: a A A

Partial Least Squares Models Based On Random Projection Algorithm And Their Applications

Posted on:2024-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X ZengFull Text:PDF
GTID:2568307157487914Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,data is a new kind of " petroleum" resource,leading a new wave of technological innovation and becoming a new driving force for economic transformation.How to extract momentous information from massive data is a research hotspot in various fields.However,the problem of "curse of dimensionality" is often encountered in the modeling and analysis of data,which means that the high dimensionality of data makes traditional statistical analysis ideas and methods inapplicable.So as to better handle highdimensional data,a large number of innovative methods are emerging.Partial Least Squares(PLS)is a widely used modeling method for solving high-dimensional problems.However,when there are many redundant variables in the data,the model obtained by this method is not sparse,which can reduce the prediction accuracy and interpretability of the model.In order to obtain a partial least squares model with high prediction accuracy and strong explanatory power,this paper introduces a dimensionality reduction technique with excellent theoretical properties and simple calculation into the partial least squares model,and develops new high-dimensional regression and classification methods for processing and analyzing high-dimensional data.The main work of this article are as follows: Firstly,we introduce the idea of axis aligned sparse random projection into the PLS model and propose a new sparse partial least squares regression algorithm named SPLSv RP.The analysis results of simulated data and four sets of real near-infrared spectral data show that SPLSv RP has better predictive performance and the ability to discover important variables than the four high-performance models PLS,PCR,SPLS-SIMPLS,and SPLS-NIPALS.Next,we applied the idea of random projection ensemble learning to the partial least squares discriminant analysis model and developed a new ensemble classification algorithm named RP_PLS-DA.Numerical results from simulated data and two publicly available real datasets indicate that the algorithm not only improves the classification performance of the base classifier PLS-DA,but also outperforms RP_LDA 、 RP_QDA 、 RP_knn and SPLS-DA when projected onto subspaces of the same dimension.And RP_PLS-DA algorithm has good model stability and robustness.The new method developed in this paper can effectively process relevant highdimensional complex data,such as near-infrared spectral data,terrain data,etc.,provide advanced data analysis methods for chemometrics,geomatics and other research fields,and enrich the analysis technology of high-dimensional complex data.
Keywords/Search Tags:Random projection, Partial least squares, Ensemble learning, Dimensionality reduction, High dimensional data analysis
PDF Full Text Request
Related items