Font Size: a A A

Research On Graph Similarity Search Based On Frequent Sub-patterns

Posted on:2010-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J T LvFull Text:PDF
GTID:2178360278960083Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
From the end of last century, as a new and effective methodology for information extraction, data mining has been getting more and more attention and researched by more and more scientists and scholars. Graph data mining, as a new cross subject, was presented in 2000, which aims to extend the traditional data mining methods to those scientific research of domain related graph databases and to develop some new mining technologies which are suit for these areas and to promote the scientific research and product finally.Two important application fields of graph data mining are chem-informatics and bioinformatics. By modeling field related information such as molecule, protein-link using graph models, some valuable patterns can be mined out from corresponding graph datasets to serve scientific research. Frequent sub-graphs mining and Graph similarity search are two important research fields of graph data mining.Frequent sub-graphs mining is also called frequent sub-patterns or sub-structure mining, that is, to mine those frequent sub-graphs or graphs from given graph datasets under some support, and these frequent sub-graphs (patterns) usually are important information for related fields.Graph similarity search is to find graphs which satisfy the given similarity constraints in field related model graph dataset. It was introduced by Yan etc in 2004. This kind of search has gained broad application in many scientific researches, such as the development of new drugs and forecast of compound toxicity etc.Similar graph containment search, as a type of graph similarity search, was mentioned by Chen the first time in his paper "Towards Graph Containment Search and Indexing" in 2007, but there is still no detailed research on this so far. All the study about graph similarity search before our research focused only on exact match and traditional substructure similarity search that retrieves all the graphs containing query graph exactly or similarly. However, similar graph containment search has a contrary containing relation, which is to find all the graphs that are contained or similarly contained by query graph. This kind of search also has a lot of applications in chemical characteristic identification, pattern recognition and computer vision and so on. As the contrary containing manner, the indexing technologies designed for traditional substructure similarity search are not fit for similar graph containment search any more. After having conducted a series of detailed research on some typical frequent sub-graphs mining algorithms and indexing strategies for graph similarity search, we propose an indexing algorithms just for similar graph containment search, which is named csIndex(coverage and support based Index). The kernel idea of csIndex is based on the support and average coverage ratio of the frequent sub-graphs in the graph dataset. The general framework of csIndex is to mine all the frequent sub-graphs from given model graph dataset first, and then, to compute the general filtering ability(GFA) of them based on average coverage ratio and support of a frequent pattern, finally, csIndex organizes these patterns into a index matrix. The experimental results on some typical datasets of this field show that except for finishing a high effective similar graph containment search, csIndex can also avoid sub-graph isomorphism test effectively, which has been proved a NP complete problem.
Keywords/Search Tags:graph mining, frequent sub-graph, similar search, similar graph containment search
PDF Full Text Request
Related items