| With the development of the times,data present massive and high-dimensional features,which pose new challenges to the application of machine learning.Multiclassification problems are found in many practical scenarios,including image recognition,natural language processing,and loan risk analysis.Linear discriminant analysis(Linear Discriminant Analysis,LDA)is one of the commonly used algorithms to solve multi-classification problems.In the face of massive data,running multi-classification LDA only on one machine is difficult to meet the needs of practical applications,therefore,it is of great research value to design it into a distributed framework that can handle large-scale data.Traditional distributed systems are centralized,which is not conducive to data privacy protection and system stability,so researchers have focused their attention on decentralized distributed algorithms with higher security and reliability.In addition,high-dimensional sparse data will have some impact on the LDA algorithm.On the one hand,the processing of high-dimensional sparse data has very high requirement on computing resources,which may reduce the computational efficiency of the LDA algorithm.On the other hand,the classification performance of the LDA algorithm will be affected.It is very important to choose the correct features.However,the sparse data multi-classification LDA algorithm in the decentralized distributed scene remains to be studied.In order to solve the multi-classification problem of high-dimensional sparse data under the background of big data,and respond to people’s calls for data privacy protection,this paper designed a data sharing mechanism between decentralized nodes combined with the gradient tracking method,proposed a network-based decentralized distributed multi-classification Sparse LDA(Network-based Decentralized Multiclass Sparse LDA,NDMSLDA)algorithm,which can obtain a discriminant analysis model on each node.At the same time,it is proved in theory that under certain conditions,when the appropriate penalty coefficient is selected,the parameter estimation will be convergent.In the experimental part of this paper,the simulated data experiment is designed to explore the effect of block coordinate descent method in selecting effective variables and the consistency and convergence of the parameter estimation of each node in the network,and it is verified that the parameter estimation of each node tends to be consistent in the iterative process,and the parameters of each node can reach convergence in a limited number of iterations.In addition,the effects of the NDMSLDA algorithm,multiclass sparse LDA(Multiclass Sparse Discriminant Analysis,MSDA)algorithm running on one machine and the centralized distributed MSDA algorithm are compared.It is demonstrated that under normal conditions,our model has close to the efficiency and accuracy of the single machine and centeralized distributed systems.And through real data experiments,using the handwritten digit(Digits)dataset,it is verified that the algorithm has the ability to solve problems in real data scenarios. |