Font Size: a A A

Research And Application Of Mining Graph Structure Algorithm

Posted on:2006-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y R KangFull Text:PDF
GTID:2178360182977263Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In face of the soaring amount of information, people are intimidated by"Data Bomb"while they fall into the fear of"shortage of knowledge". KDD coning for the need has become one of the strongest weapons that people can use to solve the paradoxical problem, Data Mining is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.Algorithm is the key part in KDD, for this reason to research for efficient. On one hand, Data Mining is used to process Large database, and so the efficiency of algorithm is the most important; On the other hand the computer in use is not satisfying to the processing of Larag database. Third, with the fast development of bioinformatics and web exploration, more and more people realize that graph can describe the complexed relations between the things better, then graph mining can find more useful information. Consequently, we should modify present algorithm to fit the need above. On account of all above, this paper chooses algorithm of graph structure data mining as the research.This paper deeply researches the graph structure data mining Algorithm. Jiawei Han et. puts forward the algorithm gSpan and CloseGraph. Comparing with the similar Apriori (such as FSG, AGM, AcGM), the success of gSpan and CloseGraph is based on the development of the novel concepts of DFS code, equivalent occurrence and early termination that help gSpan and Closegraph prune the search space substantially with small additional cost. But they still exist the following problem: The mining results of above-mentioned methods centred on graph labels. i.e., graphs with different labels are considered as different ones.Based on the research of above methods, this paper proposed two new algorithms-MaxcodeFMCG(Finding Maximal Completed Subgraph) and FSA(Finding Frequent Subgraph Structure Algorithm), based on MaxcodeFMCG algorithm and FP-growth algorithm which has been proposed, we also put forward a new algorithm-MaxcodeFP-Tree to find frequent patterns. The advantage of MaxcodeFP-Tree is saving memory, and FSA resolves the restriction of"graphs with different labels and same structures are deemed as different".
Keywords/Search Tags:Data Mining, Graph Data Mining, Maximal Completed Subgraph, Frequent Subgraph, Frequent Subgraph Structure
PDF Full Text Request
Related items