Font Size: a A A

Maximal Uncorrelated Multinomial Logistic Regression And Its Application In Large-scale Text Classification

Posted on:2020-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2428330590471728Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the expansion of the data scale,the standard logistic regression can no longer meet the application conditions of big data.One reason is that large-scale data usually contains more redundant information,that is,in large-scale data classification,there are many similar features between different classes,these similar features will cause the corresponding classes not being correctly classified.Another reason is that the standard logistic regression classification algorithm is inefficient in large-scale data classification.For the data redundancy problem,this thesis proposes a Maximal Uncorrelated Multinomial Logistic Regression(MUMLR)classification model to solve the data redundancy problem.The main idea is to reduce the weight of common information and try to keep more discriminative information in the data by adding a maximal uncorrelated regularization.In addition,in view of the relationship between multinomial logistic regression and neural networks,this thesis applies the results of “maximal uncorrelated” in multinomial logistic regression to neural networks and proposes a maximal uncorrelated neural networks(MUNN).The maximal uncorrelated neural networks has both the high robustness of the maximal uncorrelated multinomial logistic regression algorithm and the powerful fitting ability of the neural networks model,which makes the algorithm have broad application prospects.In view of the problem that the data size exceeds the processing limit of a single machine,this thesis proposes a Global Consensus Maximal Uncorrelated Multinomial Logistic Regression(GC-MUMLR)algorithm and a Sharing Maximal Uncorrelated Multinomial Logistic Regression(S-MUMLR)algorithm based on data characteristics.The global consensus maximal uncorrelated multinomial logistic regression is used to solve the problem that the number of data samples is too large and the serial gradient optimization method cannot effectively solve the problem.The sharing maximal uncorrelated multinomial logistic regression is used to solve the dimensional explosion problem caused by the excessive data dimension.Finally,based on the above research,this thesis designs and implements a large-scale text classification system,and successfully applies the algorithm results to the large-scale text classification.
Keywords/Search Tags:Multinomial Logistic Regression, Maximal Uncorrelated, Distributed, Big Data, Neural Networks
PDF Full Text Request
Related items