Research On Big Data Mining Algorithm Based On Top-k Subgraph Pattern Matching

Posted on:2014-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:J He

Full Text:PDF

GTID:2268330425480039

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With Big Data Time coming ahead of time in the Internet industry, a series of emerging and effective information extraction technology specially for big data mining has begun to attract the general concern of the researchers to research. The amount of data information generated every year keeps on the state of the explosive growth all the time, so referred to data are not also presented the larger trends in quantitative, in addition, accompanied by a qualitative change; Even as we know, traditional data mining methods are based largely on the relational database to be developed, so this way canâ€™t apply to that situation such as data types being more diverse, data structure relations being more complex. But graph mining techniques can always be applied to modeling and mining operation with big data when in need of conjunction with graph theory method such as graph query, graph traversal, graph isomorphic and so on. As well as it mainly applied to library management system (such as booksâ€™information retrieval) and social networks (such as the relationship matching between the characters) and Bioinformatics Engineering (such as PPI and Genetic Engineering) which both contain a lot of data.In this paper, the Top-k Subgraph Pattern Matching (GPM) algorithm belongs to a typical graphical big data mining technique, which is based on the isomorphism principle of graph theory and applied to single-source data graph with specific incidental tag attributes. The purpose of this kind of mining algorithm is to get a large number of matching results from the mass source data graph on conditions of satisfying the query graph labeling conditions and structure conditions (based on path).And it mainly apples to those data types for the Resource Description Framework (RDF) data.The authorâ€™s main researching work as follows.(1) Introduced some basic concept and related algorithms of Big Data Miningâ€™s graph pattern matching, mainly introduced Top-k GPM Join Algorithm which has some reference and comparison value. (2) This paper put forward a kind of efficient and universal Top-k Subgraph Pattern Matching Plan. About its usability, it take consideration of the applicability of the algorithm for the querying graph with cycles or without cycles to get the accurate Top-k Matching Results. In the matching process for the querying graph with cycles, it take the optimal spanning tree, which is chosen by Spanning Treesâ€™Cost Forecast Matching Plan, as the querying subtree to develop into Top-k matching results by expanding edges.(3) In the phase of algorithmâ€™s performance testing, the RDF graph source data adopted comes from real DBLP data. My matching plan of this algorithm not only got all accurate Top-k matching results, but also verified that it is viable to choose the optimal spanning tree to match by Spanning Treesâ€™Cost Forecast Matching Plan. And then this algorithm in this paper has taken time and space performance into overall consideration by comparing with Top-k GPM Joining Algorithm. The results of performance testing showed that its time performance have been improved greatly on the cost of consuming more extra memory space.

Keywords/Search Tags:

Big Data Mining, Top-k Subgraph Pattern Matching, Top-k GPM JoiningAlgorithm, Spanning Treesâ€™ Cost Forecast

PDF Full Text Request

Related items

1	Research On Graph Data Based Sequential Pattern Mining
2	Research On Key Techniques Of Query Processing And Mining Analysis Over Large Graph Data
3	The Research Of Subgraph Matching On Large-Scale RDF Graph
4	Research On Distributed Subgraph Matching Algorithm For Large Scale Graph Data
5	Research And Application Of Frequent Subgraph Mining Algorithm
6	Research And Implementation Of Backtracking-based Distributed Subgraph Enumeration On Large-scale Data Graphs
7	Anomalous Connected Subgraph Pattern Mining Of Multi-source Data
8	Study On Material Cost Forecast Of Coal Mine Based On Data Mining
9	Research And Implementation On Subgraph Pattern Mining Algorithms Oriented Uncertain Graph Data
10	KK-means Clustering Method Improved Based-on Minimum Cost Spanning Tree And Its Applications In Seismic Data