Font Size: a A A

Improved Logistic Regression Model Under High Dimensional Data And Its Application

Posted on:2020-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q WeiFull Text:PDF
GTID:2370330596481723Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In today's society,advances in information technology have led to a continual reduction in the cost of data acquisition.As massive amounts of data continue to emerge,the dimensions of data continue to increase.Generally speaking,the higher the dimension of the data,the greater the computational complexity,and the negative impact of noise and redundant features in the data will become more and more serious,and the more the amount of data,the longer the calculation time of the model will be.In this paper,an improved logistic regression model based on stochastic gradient descent and random projection is constructed.The model consists mainly of three parts: The first part is to improve the size of the dataset dimension.In this part,the main contrast analysis of principal component analysis(PCA)and random projection(RP)is used to compare the two different dimensionality reduction methods.Combining the case study,the random projection with faster calculation speed is selected as the dimension reduction method;The second part is to improve the size of the data set.This part selects the data after the dimensional reduction of the random projection,and applies it to the logistic regression model based on batch gradient descent and random gradient descent,respectively,and chooses the convergence speed is relatively faster.The stochastic gradient is reduced to a parameter update method;The third part is a combination optimization of the first two parts.The Lasso is further screened based on the stochastic gradient descent.The changes of the logistic regression model after Lasso are added are compared.It is verified that the addition of Lasso can improve the accuracy of the model.We select three simulation data sets for simulation experiments,and then verify the effectiveness of the improved model and apply it to the actual data.The improved logistic regression algorithm model can improve the computational efficiency and accuracy,and can be widely applied to the classification of high-dimensional data in various fields,such as finance and imaging.In this paper,because the actual data of the cat and dog images has a higher dimension,and the data is easier to obtain than the financial field,we select the cat and dog images as the experimental objects,and apply the model to the cat and dog images.The accuracy of the classification model is 79.2%,and the calculation efficiency is also significantly improved.From the above experimental analysis,we can draw conclusions from the accuracy and computational efficiency of the algorithm: First,the random projection in the data set with high dimension can not only maintain the classification accuracy but also greatly reduce the model training time,and can be widely applied to high-dimensional data sets such as images and texts;Secondly,the logistic regression model obtained by combinatorial optimization can further eliminate the useless features on the basis of dimensionality reduction,especially for large-scale sparse features,which can make the model prediction results more accurate.Third,this paper uses an optimization algorithm based on stochastic gradient descent,which is faster than the traditional batch gradient descent method for large-scale data sets.
Keywords/Search Tags:High dimensional data, Logistic regression, Random projection, Stochastic gradient descent, Combination algorithm
PDF Full Text Request
Related items