Font Size: a A A

Distributed Graph-Based Semi-Supervised Learning Algorithm Based On Matrix Completion And Its Application

Posted on:2024-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:S Q TanFull Text:PDF
GTID:2568306932955919Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of information technology and the improvement of informationization,massive amounts of data continue to emerge.In this era of big data,how to efficiently utilize and mine dispersed stored data has become one of the common problems faced by many networked systems.As a key technology for distributed data processing,distributed machine learning methods are widely used to analyze and predict global data information from distributed datasets.Depending on the learning method,existing algorithms can be classified into supervised,unsupervised,and semi-supervised learning.Graph-based semi-supervised learning,as a mainstream method of semi-supervised learning,has the advantages of parallelism,scalability,and ease of solution.However,in the scenario of distributed data storage,the enormous system overhead of building a global data graph is an important limiting factor for the application of distributed graph-based semi-supervised learning techniques.Therefore,acquiring a complete measurement matrix at a low cost is a bottleneck problem that must be addressed for distributed graph-based semi-supervised learning to become practical.To address this issue,matrix completion technology is considered a potential solution.However,most current matrix completion algorithms are centralized serial processing algorithms,which are inefficient and difficult to apply to low-cost network systems.Therefore,further research is needed to extend existing matrix completion algorithms to distributed parallel processing and apply them to distributed graph construction algorithms.In addition,the existence of straggling nodes and uneven data distribution in distributed computing systems is also one of the challenges that need to be addressed in the distributed parallel extension of graph construction algorithms based on matrix completion.In this thesis,we propose a distributed graph-based semi-supervised learning algorithm based on matrix completion technology.Specifically,the content includes algorithm parallel design,system latency,and energy consumption evaluation.By balancing the running time and accuracy of the algorithm,we propose the optimal matrix completion parameter selection,allowing the algorithm to maintain accuracy comparable to state-of-the-art methods while reducing costs.Secondly,from the perspective of practical distributed systems,we consider the influence of system noise and propose a parallel distributed graph-based semi-supervised learning algorithm based on coded computation.The algorithm is based on maximum distance separable codes and proposes a coded distributed graph construction algorithm.We provide the optimal encoding parameter selection to improve the algorithm’s robustness to system noise and introduce a small amount of computational redundancy.Finally,we apply the proposed encoding distributed graph-based semi-supervised learning algorithm to travel mode recognition based on GPS trajectories and propose data preprocessing methods to address the problem that GPS raw data cannot be directly applied to learning models.In addition,we correct load calculation tasks through modeling communication and computation latency to adapt to heterogeneous system scenarios.
Keywords/Search Tags:Graph semi-supervised learning, Distributed learning, Matrix comple-tion, Coded computation
PDF Full Text Request
Related items