Font Size: a A A

The Research On Frequent Sub-graph Mining Algorithm

Posted on:2012-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:L N ChenFull Text:PDF
GTID:2218330368487209Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining object could be structured data in relational database and also be half-structured data or non-structured data like text, image or graphic. Sub-graph mining is becoming a research hotspot with the development of non-structured frequent pattern mining. Graph is one of complex data structures, which makes it more difficult to extract interesting structures and frequent patterns of sub-graphs than general data and requires the overall use of relevant knowledge about graph and technology of data mining. For the moment, the amount of generated candidate sub-graphs is too large and so much time is needed to execute sub-graph matching and to scan data set repeatedly for calculating the support, both of which are the main problems in the field of sub-graph mining. The dissertation gives a primary study about how to reduce the amount of generated candidate sub-graphs for greater efficiency and the major works are as follows:1. An improved approach, based on AGM (Apriori-based Graph Mining) Algorithm is presented in this dissertation with the integration of the knowledge of graph theory. This method adds labels to vertexes and edges of initial graphs and sorts orders according to the value of these labels to determine the structure of adjacent matrix of graphs before the generation of candidate graphs happens. This method can reduce the amounts of redundant sub-graphs in a certain degree and make it more efficient in computing time. What's more, this improved method can determine whether the same sub-matrix is available or not between two adjacent matrixes more effectively.2. Another improved method is also presented, which is used for frequent pattern mining of connected graph towards the characteristic of connected graph. By graph theory, an in-depth analysis to node adjacent matrix and the judgment criterion of connected graph is performed. What's more, a stricter mathematical description to the conception of path and node adjacent matrix is given, which is given into the use of the improved method based on AGM Algorithm. Some limited conditions are added before the combination of adjacent matrixes happens. The improved method can reduce the amounts of candidate graphs and the computing time effectively.
Keywords/Search Tags:frequent pattern, data mining, AGM Algorithm, adjacency matrix, candidate sub-graph, connected graph
PDF Full Text Request
Related items