Font Size: a A A

MapReduce-based Graph Mining Research

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:K GuanFull Text:PDF
GTID:2308330503478550Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph data mining as an important method of data analysis, there are many applications in real life. You can graph mining technology to do the relevant scientific analysis can be compared quickly get useful information, for example, in the chemical research in the field of chemical molecules with a standard graph structure, and it has the typical rings and chains, you can use graph theory to chemistry molecular structure of data mining research. Also in biology, there are also communities of proteins based on graph mining algorithm. Looking for the protein of interest in the community in the interaction between the protein network. In this type of VIP journals Archive, there is a reference to the relationship between mining algorithm based on the literature map. Proposed to solve graph mining similar relationship between literature measure for calculating the relationship between literature-node configuration graph nodes and between nodes, links to reflect the relationship between literature and the literature, cited by How much authority to give the number of documents. Currently we are doing graphical data mining algorithm when the size of the map data used is not great, under normal circumstances can be loaded into memory all at once and operations. However, with the increasing size of data, the traditional platform in the face of these massive map data, there are many shortcomings, we can not guarantee a high operating efficiency. And with Hadoop and MapReduce technology platform represented just can solve these problems.Through research and analysis based on three Apriori Idea parallel data mining algorithms CD algorithm, DD algorithm and CaD algorithm Hadoop and MapReduce and their concrete realization in. Although after the Map Reduce programming model based on the above three kinds of data mining algorithms have a certain level of performance, but the lack of excavation during the existence of the above algorithm data, such as Map implementation phase step algorithm to calculate the loop iteration time generating a lot of unnecessary duplication and unnecessary keys to memory operations, leading to the slow processing speed, we can not take full advantage of the characteristics MapReduce programming model, add unnecessary workload. So this paper presents an improved algorithm MapReduce_Edge_Extend algorithm, frequent subgraph based on the MapReduce programming model platform mining algorithms. The main idea of this algorithm is still based on Apriori thought, during the extended side when generating new frequent subgraphs using the K-order has been frequent subgraph generated K + 1-order frequent subgraph, reducing unnecessary duplicate key value and improve the efficiency of data mining. In the experimental part, examine the above mentioned various algorithms in the end were in contrast to traditional stand-alone environment and efficiency in running an experiment with Hadoop MapReduce programming platform both cases, you can run through the Hadoop MapReduce platform and found that to ensure correctness of the algorithm based on the run-time efficiency will be greatly improved, MapReduce_Edge_Extend improved algorithm operating efficiency is relatively higher.
Keywords/Search Tags:Apriori, Hadoop, graphical data mining, MapReduce
PDF Full Text Request
Related items