Font Size: a A A

Mining Algorithm Based On Frequent Sub-graph Of The Multi-layer Index Structure

Posted on:2009-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2208360272972982Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the soaring amount of data, effectively using massive data to solve the problem can bring us great business opportunities and benefit of scientific research. Particularly in recent decades, the improving of database technology, statistics and artificial intelligence greatly promotes data mining's development. Data mining refers to automatically mine the potential modes from large databases or data warehouses which are unknown, complete and valuable to guide business activities or support scientific researches. Data mining technology and its algorithms are studied internationally by the current researchers in databases and information decision areas.Mining frequent subgraph has been widely used in many domains. Algorithms that based on Apriori and FP-Growth algorithms have been the two major research issues and have been applied widely. At the beginning, unstructured data mining model is mainly focused on, but with in-depth study its limitations increasingly shows. Frequent subgraph structures can express more extensive information and minding them are more difficult. Mining graph structures has much wider application in fields such as social networks, molecular structures and bioinformatics, etc. In recent years, graph mining has become a hotspot. Frequent subgraph mining is the basic problem of graph mining, but the core operation of subgraph isomorphism testing is of high complexity and the database must be repeatedly scanned to obtain the support value of every edge, which will consume much time. How to mine frequent subgraphs efficiencily has become one of the hot issues in the field of data mining.This paper reviews the current frequent subgraph mining algorithms. Firstly it briefly introduce AGM algorithm, which is based on Apriori and uses adjacent matrix and min(code) to refer the graph. Secondly, it analyzes gSpan algorithm which uses depth-first pattern-growth method to avoid the cost of testing in some degree, but when faced massive graph sets, it would have a large number of I/O operations that will reduces its performance. Thirdly, it introduces ADI-Mine algorithm, which uses a graph index structure to reduce the unit time of testing subgraph isomorphism and improve the efficiency. Lastly, MARGIN algorithm is introduced which limits the search space to a certain edge region.The paper analyzes the merits and defects of the above-mentioned algorithms. It proposes a new algorithm for mining frequent subgraphs in undirected labeled graphs—TG-Mine which uses a multi-storey graph index structure called GDI. Firstly, the algorithm uses the DFS tree search space, which loses no frequent subgraph and can ensure a complete set of results. Secondly, it uses the GDI index structure that completely replaces the original graph database, simplifying the operations of many data access. TG-Mine algorithm runs in two steps, In the first step, it mines frequent trees in depth-first search, and in the second step, it expands the tree with back edge only and no new node is expanded. It generates the candidate subgraphs efficiently and reduces the time of testing subgraph isomorphism and judging DFS's min code. The paper compares TG-Mine with gSpan and ADI-Mine algorithms and shows that TG-Mine relatively has the higher efficiency.
Keywords/Search Tags:Data Mining, frequent subgraph mining, min DFS code, GDI, TG-Mine
PDF Full Text Request
Related items