Fast Sparse Multinomial Logistic Regression And Distributed Parallelism

Posted on:2020-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:M Du

Full Text:PDF

GTID:2370330590971973

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,Sparse Multinomial Logistic Regression?SMLR?is widely used in hyperspectral image classification,multi-class object recognition,and disease diagnosis,etc.because it has the function of embedding feature selection during classification.Since the SMLR objective function contains the?₁ regularization term,the analytical solution cannot be obtained directly,so it is usually solved in an iterative manner.The SMLR problem was first solved by Iterative Reweighted Least Squares method?IRLS?,but the IRLS algorithm is sensitive to feature dimensions and number of classes.IRLS algorithm has high computational complexity when dealing with high-dimensional datasets or datasets with a large number of classes.Therefore,the demand for advanced SMLR optimization algorithms is becoming more and more urgent.In order to improve the accuracy and speed of serial SMLR solution,based on the Alternating Direction Multiplier Method?ADMM?,the Fast Sparse Multinomial Logistic Regression?FSMLR?algorithm is designed and proposed in this thesis.The experimental results show that the classification accuracy of FSMLR algorithm is optimal on multiple datasets,and it is much better than the IRLS algorithm in running time.Considering that the serial optimization algorithm of SMLR problem has been difficult to meet the time and memory requirements for processing large-scale data.In this thesis,a Sample Partitioning based Distributed SMLR?SP-SMLR?distributed algorithm is proposed for large-scale samples scenario,and a Feature Partitioning based Distributed SMLR?FP-SMLR?distributed algorithm is proposed for large-features scenario.SP-SMLR algorithm and FP-SMLR algorithm make use of the decomposability of the ADMM algorithm and achieve the task parallelization by splitting the single objective function of SMLR into multiple objective functions.In addition,the original large-scale dataset is divided into multiple sub-datasets in various ways,each task is optimized based on the sub-dataset,and data parallelization is also achieved,which greatly reduces the data communication cost of the task in the distributed environment.In this thesis,SP-SMLR and FP-SMLR algorithms are implemented using Spark distributed computing framework,and experiments are carried out on real large-scale data sets.The big data experiments show that our distributed parallel SMLR algorithms can scale for massive samples and large-scale features,with high precision,and can solve at a faster speed.

Keywords/Search Tags:

Big Data, Sparse Multinomial Logistic Regression, Alternating Direction Method of Multipliers, Distributed Parallel

PDF Full Text Request

Related items

1	The Linearized Alternating Direction Method Of Multipliers For Sparse Group LAD Model
2	Penalized Regression Estimation Based On Parallel Computing
3	Semi-Proximal Alternating Direction Method Of Multipliers For Sparse Inverse Covariance Matrices Estimation
4	Algorithm Research On Constrained Optimization Problems Governed By Elliptic Equations
5	A Study On Some Problems Of Alternating Direction Method Of Multipliers
6	Some Theoretical Research Of The Generalized Alternating Direction Method Of Multipliers
7	Distributed Quantile Regression Algorithms And Applications
8	Alternating Direction Method Of Multipliers For Linear Inverse Problems
9	Low-rank And Sparse Regularized Matrix Regression Optimization Method
10	The Linearized Alternating Direction Method Of Multipliers For Mixed Matrix Regression Model