Font Size: a A A

Research On Big Data Mining Algorithm Based On Top-k Subgraph Pattern Matching

Posted on:2014-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:J HeFull Text:PDF
GTID:2268330425480039Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With Big Data Time coming ahead of time in the Internet industry, a series of emerging and effective information extraction technology specially for big data mining has begun to attract the general concern of the researchers to research. The amount of data information generated every year keeps on the state of the explosive growth all the time, so referred to data are not also presented the larger trends in quantitative, in addition, accompanied by a qualitative change; Even as we know, traditional data mining methods are based largely on the relational database to be developed, so this way can’t apply to that situation such as data types being more diverse, data structure relations being more complex. But graph mining techniques can always be applied to modeling and mining operation with big data when in need of conjunction with graph theory method such as graph query, graph traversal, graph isomorphic and so on. As well as it mainly applied to library management system (such as books’information retrieval) and social networks (such as the relationship matching between the characters) and Bioinformatics Engineering (such as PPI and Genetic Engineering) which both contain a lot of data.In this paper, the Top-k Subgraph Pattern Matching (GPM) algorithm belongs to a typical graphical big data mining technique, which is based on the isomorphism principle of graph theory and applied to single-source data graph with specific incidental tag attributes. The purpose of this kind of mining algorithm is to get a large number of matching results from the mass source data graph on conditions of satisfying the query graph labeling conditions and structure conditions (based on path).And it mainly apples to those data types for the Resource Description Framework (RDF) data.The author’s main researching work as follows.(1) Introduced some basic concept and related algorithms of Big Data Mining’s graph pattern matching, mainly introduced Top-k GPM Join Algorithm which has some reference and comparison value. (2) This paper put forward a kind of efficient and universal Top-k Subgraph Pattern Matching Plan. About its usability, it take consideration of the applicability of the algorithm for the querying graph with cycles or without cycles to get the accurate Top-k Matching Results. In the matching process for the querying graph with cycles, it take the optimal spanning tree, which is chosen by Spanning Trees’Cost Forecast Matching Plan, as the querying subtree to develop into Top-k matching results by expanding edges.(3) In the phase of algorithm’s performance testing, the RDF graph source data adopted comes from real DBLP data. My matching plan of this algorithm not only got all accurate Top-k matching results, but also verified that it is viable to choose the optimal spanning tree to match by Spanning Trees’Cost Forecast Matching Plan. And then this algorithm in this paper has taken time and space performance into overall consideration by comparing with Top-k GPM Joining Algorithm. The results of performance testing showed that its time performance have been improved greatly on the cost of consuming more extra memory space.
Keywords/Search Tags:Big Data Mining, Top-k Subgraph Pattern Matching, Top-k GPM JoiningAlgorithm, Spanning Trees’ Cost Forecast
PDF Full Text Request
Related items