Decentralized LDA Algorithm Based On Network

Posted on:2024-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Gao

Full Text:PDF

GTID:2557307067491554

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the development of the times,data present massive and high-dimensional features,which pose new challenges to the application of machine learning.Multiclassification problems are found in many practical scenarios,including image recognition,natural language processing,and loan risk analysis.Linear discriminant analysis(Linear Discriminant Analysis,LDA)is one of the commonly used algorithms to solve multi-classification problems.In the face of massive data,running multi-classification LDA only on one machine is difficult to meet the needs of practical applications,therefore,it is of great research value to design it into a distributed framework that can handle large-scale data.Traditional distributed systems are centralized,which is not conducive to data privacy protection and system stability,so researchers have focused their attention on decentralized distributed algorithms with higher security and reliability.In addition,high-dimensional sparse data will have some impact on the LDA algorithm.On the one hand,the processing of high-dimensional sparse data has very high requirement on computing resources,which may reduce the computational efficiency of the LDA algorithm.On the other hand,the classification performance of the LDA algorithm will be affected.It is very important to choose the correct features.However,the sparse data multi-classification LDA algorithm in the decentralized distributed scene remains to be studied.In order to solve the multi-classification problem of high-dimensional sparse data under the background of big data,and respond to people’s calls for data privacy protection,this paper designed a data sharing mechanism between decentralized nodes combined with the gradient tracking method,proposed a network-based decentralized distributed multi-classification Sparse LDA(Network-based Decentralized Multiclass Sparse LDA,NDMSLDA)algorithm,which can obtain a discriminant analysis model on each node.At the same time,it is proved in theory that under certain conditions,when the appropriate penalty coefficient is selected,the parameter estimation will be convergent.In the experimental part of this paper,the simulated data experiment is designed to explore the effect of block coordinate descent method in selecting effective variables and the consistency and convergence of the parameter estimation of each node in the network,and it is verified that the parameter estimation of each node tends to be consistent in the iterative process,and the parameters of each node can reach convergence in a limited number of iterations.In addition,the effects of the NDMSLDA algorithm,multiclass sparse LDA(Multiclass Sparse Discriminant Analysis,MSDA)algorithm running on one machine and the centralized distributed MSDA algorithm are compared.It is demonstrated that under normal conditions,our model has close to the efficiency and accuracy of the single machine and centeralized distributed systems.And through real data experiments,using the handwritten digit(Digits)dataset,it is verified that the algorithm has the ability to solve problems in real data scenarios.

Keywords/Search Tags:

Distributed Computation, Decentralization, Linear Discriminant Analysis, High-dimensional Sparse Data, Gradient Tracking

PDF Full Text Request

Related items

1	Estimation Of An Adaptive Double-sparse High-dimensional Generalized Linear Model Based On Overlap Group Lasso
2	Research On Variable Selection In High Dimensional Data
3	Research Of Correlation Measurement Theory On High-dimensional Sparse Data
4	Local Linear Embedded LLE Method For Nonlinear Dimension Reduction Based On High Dimensional Space
5	High Dimensional Discriminant Analysis Of Two General Populations
6	Testing Serial Correlation In Linear Model With High Dimensional Data
7	Statistical Inference And Computation Of Change-points In Linear Models
8	Discriminant Analysis Of High-Dimensional And Multi-Population Based On Features Annealed Independence Rules
9	Hypothesis Tests Of Mean Vectors And Covariance Matrices In High-dimensional Data
10	Primary School And Secondary School Website Education Informationization Topic Discovery And Trend Analysis