Font Size: a A A

Improvement Of Logistic Regression Algorithm And Research On Parallelization Based On TensorFlow

Posted on:2020-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:C QinFull Text:PDF
GTID:2428330575477347Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important branch of data mining,data classification has become a research hotspot in the era of big data.Logistic regression is an important classification algorithm for machine learning.With its advantages of simple model,efficient training and high fitting degree of linear data,logistic regression has been widely used in Internet,medical treatment,finance and other fields.To calculate the loss function in logistic regression algorithm,gradient descent,an iterative algorithm,is what we normally use these days and among all the gradient descent methods,mini-batch gradient descent method outweighs the others because of its advanced performance.Traditional mini-batch gradient descent method applies fixed state learning rate,whose disadvantage lies on how to select proper learning rate and being unable to reach the optimal value of loss function as the result of constant learning rate during algorithm training procedure.Therefore,the training efficiency and classification accuracy of the algorithm will be badly affected,under the situation that the sample data size is getting larger and larger,the demand cannot be met.Therefore,the research direction of this paper is to improve the algorithm.At the same time,under the condition where sample data grows exponentially,we hold higher and higher demand upon training efficiency of Logistic regression algorithm.With the development of GPU hardware technology,researches concentrate on GPU general calculation field and this obviously provides new direction of acceleration of Logistic regression algorithm.As one of the most popular machine learning frame,Tensor Flow supports the call of GPU,which offers better platform for Logistic regression to realize acceleration.To the two problems discussed above,this thesis completes the two tasks below:(1)To improve the fixed state learning rate applied in mini-batch gradient descent method,learning rate cosine transform is referred to.Such method selects proper range of initial learning rate at first,then let the value range change according to the feature of cosine transformation,in order to make learning rate decrease in different rates during different periods.Finally achieves that the loss function of logistic regression gets convergent to the optimal value with better effects.Moreover,adding Data rearrangement module and Regularize punishment methods keeps logistic regression algorithm more capable in generalization.Experimental results show that the improved logistic regression algorithm has better training efficiency and classification accuracy.(2)With the support of matrix computing power from GPU and Tensor Flow combined together,based on Tensor Flow platform this paper designed two parallel methods of logistic regression algorithm,which are called single-GPU with one computer and multi-GPU with one computer.The idea behind the design is vectorization of logistic regression algorithm—computing complex matrix in GPU through Tensor Flow.Single-GPU with one computer apllies model parallel while the multi-GPU applies the combination of data parallel and model parallel.In experiments,these two methods achieve remarkable acceleration in large-scale data sets.Starting from the improvement of logistic regression algorithm,this paper improves algorithm training efficiency and classification accuracy,and realizes algorithm parallelization through Tensor Flow to speed up algorithm operation,which has certain research significance and practical value.
Keywords/Search Tags:Logistic regression, Mini-batch gradient descent, Learning rate, TensorFlow, GPU, Parallel algorithm
PDF Full Text Request
Related items