Font Size: a A A

Several Researches On Classification Learning And Its Distributed Computing Methods And Theories

Posted on:2024-07-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:G M SunFull Text:PDF
GTID:1527307301959079Subject:Statistics
Abstract/Summary:PDF Full Text Request
Classification problems are a typical supervised learning task in machine learning and are widely used in real-world applications.With the rapid development of information technology,massive data sets begin to appear frequently,which undoubtedly brings unprecedented challenges to machine learning tasks including classification learning.Most obviously,massive data sets place higher demands on the computing power and storage space of computers in machine learning.Faced with these practical problems,this article studies the classification learning problem and its distributed computing methods and theories under large-scale data.First,considering that massive data sets are increasingly common in multicategory classification problems and are often stored in distributed environments,this article first proposes a distributed estimation method under the multicategory angle-based classification(MAC)model,and obtains the error upper bound of the excess risk of the distributed estimation.Further,under varied robustness settings,we develop two robust distributed algorithms to provide robust estimations of the multicategory classification.The first robust distributed algorithm takes advantage of median-of-means(MOM)and is designed by the MOM-based gradient estimation.The second robust distributed algorithm is implemented by constructing the weighted-based gradient estimation.The theoretical guarantees of our algorithms are established via the non-asymptotic error bounds of the iterative estimations.Some numerical simulations demonstrate that our methods can effectively reduce the impact of outliers.Secondly,noting the efficient classification performance of the multicategory anglebased support vector machine(SVM)model,this article investigates the statistical properties of parameter estimation in multicategory SVM model,and establishes the Bahadur representation and asymptotic normality of its parameter estimation.Notice that the new challenges posed by the widespread presence of distributed data,this article further develops a distributed smoothed estimation for the multicategory SVM and establishes its theoretical guarantees.Through the derived asymptotic properties,it can be seen that our distributed smoothed estimation can achieve the same statistical efficiency as the global estimation.Numerical studies are performed to demonstrate the highly competitive performance of our proposed distributed smoothed method.In the last part of this article,to solve the algorithm optimization problem caused by the non-smoothness of the Hinge loss function in the standard SVM model,we proposes a convolution-type smoothed SVM model under large-scale data sets.Through convolution-type smoothing technology,we obtain a second-order differentiable convex objective function,which can then be easily solved using classic algorithms such as gradient descent.At the same time,we prove that the theoretical optimal parameters of the convolutional smoothed SVM model can converge to the true model parameter.Moreover,we established the non-asymptotic error upper bound,Bahadur representation and asymptotic normality of the smoothed estimator.In addition,we propose a communication-efficient distributed algorithm to deal with the parameter estimation problem in distributed scenarios.Numerical results verify the effectiveness of the distributed computing method proposed in this article.
Keywords/Search Tags:Classification Learning, Distributed Computing, Multicategory Angle-based Classification, Multicategory Support Vector Machine, Robustness, Bahadur Representation, Asymptotic Normality, Kernel Smoothing Method, Convolution-type Smoothing method
PDF Full Text Request
Related items