Font Size: a A A

Algorithms For Mining Frequent Subgraphs And Its Applications In Biological Network

Posted on:2010-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:A G DongFull Text:PDF
GTID:2178360278955036Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
The major completion of human genome plan indicate that the age of post-genome is finally coming. The abundant bio-information data people accumulated in recent years provide the basic reference for revealing the mystery of lives, and the focus of research in biology have changed from the partial research of individual genes or proteins in cells to the system research based on the entire genes and proteins as well as the metabolic products. The research on the structure and the detection technique on gene regulatory networks, protein- protein interaction networks and metabolic networks have promoted the molecule-biology age to the system-biology age stage by stage. Genes and proteins can generate superiority functional modules by netty interactions, therefore, using mathematic modeling to design efficient algorithms for mining and analyzing the modules is help to study the functions of the organism itself and the evolvement relations between different species, and could provide warrants for analyzing and understanding the basic disciplines of lives. In this paper, some works are done in the frequent subgraph mining.Some classical graph-based algorithms for mining frequent subgraphs are systematically studied and comprehensively summarized in this paper. based on this, a novel algorithm for mining frequent subgraphs is proposed, which consists of subgraph searching and graph isomorphism testing. For subgraph searching, a definition of ring destribution is introduced, and an algorithm based on ring distribution - ESR(Enumerate Subgraphs based on Ring) is presented. For graph isomorphism, by using degree sequences and eigenvalues of a graph, two algorithms are proposed for directed graphs and undirected ones, respectively. Graph isomorphism algorithm is used to categorize the searched subgraphs, and frequent subgraphs are obtained according to the result of the categorizing. When the size of a network is large, there are a large number of subgraphs, which leads to high computational effort. As a result, we put forward a random categorizing algoritm and a Hamilton subgraph mining algorithm. The former selects at random a certain quantity of subgraphs for isomorphism testing, which is an approximate algorithm. And the latter aims at finding only certain type of subgraphs, such as subgraphs with Hamilton cycle, which reduces the number of searched subgraphs. Experimental evaluation on five real biological networks shows that our algorithm is superior to the algorithms available.
Keywords/Search Tags:data mining, frequent subgraph, graph isomorphism, motif, algorithm, Hamilton subgraph
PDF Full Text Request
Related items