Font Size: a A A

Research On Techniques For Mining Discriminative Subgraph Patterns

Posted on:2017-04-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:1318330542489651Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Graphs can be used to represent complex relationships between data objects in many scientific researches.With the development of computer science and information technology,applications from several areas such as bioinformatics,computational chemistry,social networks and computer vision make use of large amounts of data encoded as graphs.Discovering useful knowledge from large-scale graph databases has become one of the important research topics in the data mining domain.Graph pattern mining is the important research branch in graph mining,since many applications including graph clustering,classification and query rely on graph patterns to analyze and manage graph data.In order to cope with the scalability problem of graph pattern mining algorithms,the interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of subgraph patterns that are representative,significant or discriminative,which could enhance the usefulness of the mining results.In this dissertation,we focus on the techniques for mining discriminative subgraph patterns,review the current research results and propose some new research problems and approaches for mining discriminative subgraph patterns.The main contributions of the dissertation are as follows:(1)We propose the non-redundant synergy discriminative subgraph mining problem.By guaranteeing the property that the discriminative powers of synergy graph patterns are much higher than all their subgraphs,mining non-redundant synergy graph patterns can dramatically reduce the number of results and still capture the strong discriminative power graph patterns.In addition,through studying the properties of non-redundant synergy graph patterns,the corresponding pruning techniques are proposed.Moreover,two fast synergy graph pattern detection methods are proposed based on the bound estimation of the discriminative powers.Based on those techniques,an efficient depth-first algorithm GINS is presented for this mining problem.Extensive experiments are conducted on a series of real-life and synthetic datasets.The results show that GINS is more efficient than two representative competitors.Besides,when the non-redundant synergy graph patterns are considered,the classification accuracy is improved much.(2)We propose the Top-K discriminative subgraph mining problem based on diversity measure.By exploiting the graph structure similarity and support set similarity restraints,we introduce the criterion of graph pattern diversity measure.Two efficient algorithms,Greedy-TopK and Leap-TopK,are proposed for the problem.Greedy-TopK algorithm employs two-step strategies to incrementally and greedily mine K discriminative subgraphs.By limiting the structure similarity graph pattern extension in the discriminative subgraph mining process,Leap-TopK algorithm could leap the graph pattern searching space.Extensive experimental results demonstrate that Leap-TopK algorithm is more efficient than Greedy-TopK algorithm.Besides,when the mining result discriminative subgraphs are considered,the classification accuracies of the two algorithms are almost the same.However,they are all superior to the traditional mining algorithm in terms of usefulness.(3)For discriminative subgraph mining problem in large-scale graph databases,we propose a generic framework for mining large-scale discriminative subgraphs.Based on the density and class graph partition techniques,the parallel MapReduce framework can reduce effectively the computation space by dealing with balanced graph databases.By filtering the low discriminative subgraph sets using the ELM algorithm,the framework could enhance the usefulness of the mining results.Meanwhile,we use the Support Graph Vector Model and ELM algorithm to build graph classifiers.Extensive experimental results on both real and synthetic datasets show that the usefulness of the large-scale discriminative subgraph mining framework and the efficiency of the ELM algorithm are better.(4)For discriminative subgraph mining problem in large-scale graph databases,we propose an efficient method,MRGAGC,to process discriminative subgraph mining.MRGAGC employs the iterative MapReduce framework to mine discriminative subgraphs.Each Map step applies the evolutionary computation and three evolutionary strategies to generate a set of locally optimal discriminative subgraphs,and the Reduce step aggregates all the discriminative subgraphs and outputs the result.The iteration loop terminates until the stopping condition satisify.In the end,we employ subgraph coverage rules to build graph classifiers using the discriminative subgraphs mined by MRGAGC.Extensive experimental results on both real and synthetic datasets show that MRGAGC obviously outperforms the other approaches in terms of both classification accuracy and runtime efficiency.
Keywords/Search Tags:graph mining, graph pattern, discriminative subgraph, diversity, MapReduce, graph classification
PDF Full Text Request
Related items