Font Size: a A A

Research And Implementation On Subgraph Pattern Mining Algorithms Oriented Uncertain Graph Data

Posted on:2014-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:B L QiFull Text:PDF
GTID:2308330473453793Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an important data structure, graph has been used more and more widely in many fields. For example, when researchers analyze the compound, social networking and other data, graph is often used to be a model, and then obtain the certain data graph. However, in real world, data is often noisy, incomplete and imprecise due to collection, transmission and other technical limitations. Therefore, it is more appropriate to choose uncertain graphs as a model.With the amount of uncertain graph data increasing dramatically, the study of how to mine useful information efficiently from these rich structures and semantic information contained in the uncertain graph data is one of the research focus in the fields of database and data mining. The study will have important theoretical and practical value. However, since the introduction of uncertainty, the traditional graph data mining algorithms cannot be directly used for uncertain graph data. Based on this, the research on subgraph pattern mining from uncertain graph data is carried out as follows:First of all, we learn the related knowledge about uncertain graph data mining, master the core ideas of the existing algorithms and analyze the applicable scenario, the pros and cons of each algorithm.Secondly, according to the problem of frequent subgraph pattern mining on uncertain graph data, an algorithm named FSMWT is proposed which follows the framework of "candidate generator-candidate judgment". In this algorithm, we adopt the well-known DFS encoding enumeration framework and improve it by cereating the corresponding data structure GEindex for each node of the enumeration framework. In the phase of candidate generation, we can extend the subgraph according to its GEindex only in those graphs which contain the subgraph rather than scan the entire database, which can save a lot of time. In the stage of judgment, the determination efficiency will be improved by using GEindex which can achieve a direct calculation of expected support without the subgraph isomorphism operations. In addition, deterministic and probabilistic pruning techniques are proposed in the paper, which can further improve the efficiency of the algorithm. Experimental results show that FSMWT algorithm has a higher mining efficiency than other algorithms.Finally, MTKUG algorithm is proposed in this paper for the problem of mining fc-truss cohesive subgraph patterns from uncertain graph data. The paper formally defines the problem of mining Top-K k-truss cohesive subgraph patterns from uncertain graph data and gives the concept of expected support. The algorithm puts forward the method for calculating subgrah’s expected support and improves the computational efficiency through the distributed parallel computing system. In addition, the efficiency of the algorithm is further improved combined with the pruning strategy proposed in this paper. The algorithm reduces the size of the result set effectively by only mining Top-K k-truss. A large number of experiments show that the algorithm is effective and efficient.
Keywords/Search Tags:data mining, uncertain graph, frequent subgraph, k-truss, pruning
PDF Full Text Request
Related items