Font Size: a A A

Parallel Estimating Of Gaussian Graphical Models' Structure Based Distributed Data

Posted on:2018-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2310330518498018Subject:Systems Science
Abstract/Summary:PDF Full Text Request
The Gaussian Graph Model is an undirected graph model based on the Gaussian distribution hypothesis. The nodes in the graph represent variables, and the boundaries between nodes represent the dependencies between variables. The model learning under the high dimension and complex structure data is the hot and difficult problem of graph model research. Gaussian Graph Model is widely used in statistical machine learning, data mining, computer vision and biological information and other fields because of its high value for research and application. With the advent of large data age, data collection and mining capabilities continue to increase, we can get a sharp increase in the amount of data. The traditional graph model structure estimation algorithm is designed based on the independent computing system, the condition of the existing equipment can not satisfy the graph model structure estimation under massive data.In order to solve the above problems, this paper proposes two parallel distributed algorithms based on traditional algorithms, which are representative in the traditional Gaussian Graph Model structure estimation algorithm, so that the existing computer can carry out higher-dimensional and more complex graph model structures estimate. First of all, we propose a parallel distributed algorithm based on the neighborhood selection algorithm in this paper (DCD-NS). Under the premise of guaranteeing the accuracy and efficiency of the solution,this algorithm can get a complete graph model structure by making the calculation nodes in the MPI cluster compute using the partial data and summarizing the results of each node. But this parallel distributed strategy does not have versatility, because the most basic process of solving can not be parallel. In order to overcome the above problems, this paper proposes a parallel distributed block coordinate descent method (PDBCD),which is applicable to the MapReduce framework. This algorithm can be used to solve a class of optimization problems when adopt appropriate strategies to reduce the results from each computing node. In this paper, the convergence, efficiency, efficiency and accuracy of this algorithm are analyzed in detail,and the validity of the algorithm is proved. In this paper,the above parallel distributed optimization method is combined with the GLasso algorithm which is representative in the traditional graph model estimation algorithm, and then a parallel distributed GLasso algorithm(DBCD-GLasso) is proposed and implemented in the Spark framework. All the computing nodes in the Spark cluster can use the partial data to complete and estimate the structure of graph model. The experimental results show that the algorithm is suitable for the computing cluster under the parallel framework of MapReduce. At the same time, the algorithm can effectively reduce the memory occupancy rate of a single computing node in the computing process when guaranteed the accuracy and efficiency of GLasso algorithm.
Keywords/Search Tags:Gaussian model, Distributed, Parallel, Neighborhood Selection, GLasso, Lasso, (Block) Coordinates Descent
PDF Full Text Request
Related items