Font Size: a A A

Research And Application Of Multi Classification Logistic Regression Algorithm In Big Data Environment

Posted on:2019-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:J Y DengFull Text:PDF
GTID:2428330572463631Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of the industry data,more and more attention is paid to the value of big data.How to mine the useful information from the big data is an important research topic.Logistic regression is one of the common methods of data classification,due to the simplicity of its model and the fast training speed,it has a wide range of applications in Medical treatment,Finance,etc.When training the logistic regression model,as the increase of the size of the training dataset in practical calculation,it needs more memory capacity,the traditional logistic regression algorithm has poor performance.This paper implements a multi-classification logistic regression based on HBase and improves the algorithm by solving the training dataset for logistic regression model,which may exceed the memory size of the client machine that executes computing,so a Chunk BGD algorithm is proposed to compute the coefficients of regression model.The main work is summarized as follows:First of all,after putting the training dataset in HBase,and then this paper using the Chunk BGD algorithm to solve the memory limitation problem,a data chunk with appropriate size including training sample data and classification result value can be obtained by setting the StartRow and StopRow parameters of the scan object.In case of avoiding frequent RPC calls from client to server,the chunk can be iterated multi-times to accelerate the convergence of coefficients.When the obtained chunk reaches the specified iteration times,then next chunk is taken out according to the order of row keys.These kinds of circles will be repeated until the convergence of coefficients or reaching the loop control threshold.The multi-classification logistic regression problem can be resolved by converting to two classification models,so the result value column qualifier for each classification must be added into training data table in HBase,combining with the training sample column family,each classification regression coefficients can be obtained by the chunk BGD algorithm.The result of experiment proves that the testing samples can be classified accurately by the regression coefficients.
Keywords/Search Tags:Chunk BGD, Multi-Classification, Logistic Regression, Big Data, HBase
PDF Full Text Request
Related items