Font Size: a A A

Fast Sparse Multinomial Logistic Regression And Distributed Parallelism

Posted on:2020-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:M DuFull Text:PDF
GTID:2370330590971973Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,Sparse Multinomial Logistic Regression?SMLR?is widely used in hyperspectral image classification,multi-class object recognition,and disease diagnosis,etc.because it has the function of embedding feature selection during classification.Since the SMLR objective function contains the?1 regularization term,the analytical solution cannot be obtained directly,so it is usually solved in an iterative manner.The SMLR problem was first solved by Iterative Reweighted Least Squares method?IRLS?,but the IRLS algorithm is sensitive to feature dimensions and number of classes.IRLS algorithm has high computational complexity when dealing with high-dimensional datasets or datasets with a large number of classes.Therefore,the demand for advanced SMLR optimization algorithms is becoming more and more urgent.In order to improve the accuracy and speed of serial SMLR solution,based on the Alternating Direction Multiplier Method?ADMM?,the Fast Sparse Multinomial Logistic Regression?FSMLR?algorithm is designed and proposed in this thesis.The experimental results show that the classification accuracy of FSMLR algorithm is optimal on multiple datasets,and it is much better than the IRLS algorithm in running time.Considering that the serial optimization algorithm of SMLR problem has been difficult to meet the time and memory requirements for processing large-scale data.In this thesis,a Sample Partitioning based Distributed SMLR?SP-SMLR?distributed algorithm is proposed for large-scale samples scenario,and a Feature Partitioning based Distributed SMLR?FP-SMLR?distributed algorithm is proposed for large-features scenario.SP-SMLR algorithm and FP-SMLR algorithm make use of the decomposability of the ADMM algorithm and achieve the task parallelization by splitting the single objective function of SMLR into multiple objective functions.In addition,the original large-scale dataset is divided into multiple sub-datasets in various ways,each task is optimized based on the sub-dataset,and data parallelization is also achieved,which greatly reduces the data communication cost of the task in the distributed environment.In this thesis,SP-SMLR and FP-SMLR algorithms are implemented using Spark distributed computing framework,and experiments are carried out on real large-scale data sets.The big data experiments show that our distributed parallel SMLR algorithms can scale for massive samples and large-scale features,with high precision,and can solve at a faster speed.
Keywords/Search Tags:Big Data, Sparse Multinomial Logistic Regression, Alternating Direction Method of Multipliers, Distributed Parallel
PDF Full Text Request
Related items