Font Size: a A A

Research On The Key Technology Of Graph Mining

Posted on:2018-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiuFull Text:PDF
GTID:2348330518993490Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Graph is one of the most common used data structure and is often used to describe the complex relationships between things. Graph mining is an important research direction in the field of data mining. More and more researches have managed to make understanding of multiple networks by identifying the dense parts of the graph, from the early analysis of social networks to the now well-known Web, financial markets, and biological systems. A variety of graph theory models have been developed to describe the dense component of the graph. In this paper, the quasi-clique model is taken as the starting point. With the focus on the maximal quasi-clique enumeration algorithm, the following research work is carried out.(1) With the network of practical application has grown to a rather large scale, the size of the memory has become the bottleneck of the existing maximal quasi-clique enumeration algorithm. Aiming at the above problem, a graph partition method has been proposed. The original graph is divided into multiple small-scale subgraphs, therefore the size of each subgraph does not exceed the limits of computer memory. It is proved that the result of the general maximal quasi-clique enumeration algorithm for all subgraphs is equivalent to the execution result of the original graph.Based on the graph partition method, an algorithm for maximal quasi-clique enumeration with limited memory (LM-MQCE) is proposed.(2) Give the LM-MQCE algorithm program. The experimental results show that the LM-MQCE algorithm can significantly reduce the memory cost compared to the existing maximal quasi-clique enumeration algorithm,and can restrict the memory cost to a controllable range. Observe the effect of each parameter variable on the memory cost, make qualitative analysis and determine the approximate relationship function,and provide a rough guidance for setting the algorithm parameter.(3) Point out the parallel optimization space of LM-MQCE algorithm,and propose a parallel algorithm called parallel-LM-MQCE based on multi-thread on single machine, which improves the time efficiency of maximal quasi-clique enumeration by several times. Furthermore, a MapReduce solution based on Hadoop distributed platform is presented,which breaks through the limitation of CPU cores and memory in single machine environment.
Keywords/Search Tags:graph mining, quasi-clique, enumeration algorithm, limited memory
PDF Full Text Request
Related items