Research Of Efficient Frequent Subgraph Mining On Big Graph

Posted on:2016-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:G Fan

Full Text:PDF

GTID:2308330464956803

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In modern life, graph is used to describe the normal things more and more frequently, such as the composition of molecule structure, protein networks, social networks, semantic networks and the description of the biological information network. This all need graph, the graph can show the things more vividly and people can understand it better. Along with the development of the time, the development of the computer industry is faster, and the data scale is more and more big, the scale of the graph data is the same at the same time, maybe bigger. In the environment of the large scale of graph data nowadays, the way to find the information you need efficiently has great commercial value and scientific value, and it is always a contribution to the computer industry.Data mining has been a popular research topics in the world, especially when the coming of the era of big data, related technologies and algorithms of data mining is to get the attention and the research of many scholars, frequent subgraph mining is a basic knowledge in data mining, and it is mostly studied.Data statistics and analysis, information retrieval, machine learning and pattern recognition are the main Implementations. The data mining methods based on Apriori algorithm and Fp- growth are the main research direction and they are also classical. This essay is based on g Span and it is an improvement of g Span, g Span algorithm is applying on Hadoop, so it can accomplish graph mining distributedly.Firstly, in this paper, k-medodis algorithm is improved and it is called k-medodis++ algorithm. k- medodis++ algorithm selects initial center, improve the stability of the k-medodis++ algorithm and then we get a k cluster.Secondly,implementing the distributed storage of each cluster and setting copies for every data block. Through computing the balanced value, we judge whether a copy can be placed in one node, avoid the case that one’s balanced value is bigger than the given threshold.Thirdly, establishing a four layer index structure named MADI to replace the adjacency list. The subgraph extension and support calculation is based on the retrieval of the MADI index structure, so the complexity of algorithm and the number of disk IO is reduced.Finally, we can make sure some candidate subgraphs are min- DFS-code through extending subgraph orderly, so the steps of subgraph isomorphism and the complexity of g Span++ algorithm are reduced.

Keywords/Search Tags:

Data Mining, Frequent subgraph, k-medodis clustering, Hadoop, MapReduce, gSpan

PDF Full Text Request

Related items

1	Frequent Subgraph Mining In Graph Databases Based On MapReduce
2	Research On Algorithms For Frequent Subgraph Mining
3	Frequent Subgraph Mining And Its Application In Compound Property Prediction
4	Research And Application Of Frequent Subgraph Mining Algorithm
5	Research On Differentially Private Frequent Subgraph Mining
6	Algorithms For Mining Frequent Subgraphs And Its Applications In Biological Network
7	Research And Application Of Mining Graph Structure Algorithm
8	Research Of Frequent Itemsets Mining Algorithm Based On MapReduce Calculation Model
9	Research On Algorithms For Maximal Frequent Subgraph Mining
10	Study On Mining Closed Frequent Itemset Based On Hadoop