Font Size: a A A

Key Techniques Of Map Database Frequent Pattern Mining

Posted on:2013-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:S QuFull Text:PDF
GTID:2248330374954328Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Mining frequent patterns from the graph database has wide range of applicationssuch as Cheminformatics, Computational Biology, WEB information management,Social network analysis. In this paper, we focus on the techniques of finding frequentsubgraphs from graph databases in order to address the problem, such as scalability, theexponential number of frequent patterns, high complex, redundant, etc. We introduce arandomized approach RMPM for mining maximal frequent patterns, and propose twoalgorithms, FRSM and InRSM which mines a summary representation of the set offrequent graphs. The main contributions of this paper are as follows:1. Traditional or na ve randomized algorithms have a high effectiveness in frequentpatterns mining, but they will obtain the same pattern many times, thereby inevitablycauses multiple and useless subgraph isomorphism computing. In order to improve theefficiency of the existing randomized algorithms, RMPM can use the patterns minedalready to generate maximal frequent subgraphs. The extensive experiments on real andsynthetic datasets verify the effectiveness and efficiency of our algorithms; experimentalso shows that RMPM offers very good scalability to large graph databases. Therandomized algorithms that we proposes is generic and is equally applicable to differentkinds of patterns, such as free trees, item set, etc. The extensive experiments of miningmaximal frequent free trees on real graph databases show that the RMPM algorithm isgeneric.2. The FRSM algorithm aims to mine a relative small set of representativesubgraphs to summarization of the large set of frequent subgraphs. In the first stage ofFRSM, the algorithms RMPM is used to mine a set of frequent subgraphs; in the secondstage, some form of clustering is adopted to obtain the representative patterns set.FRSM consider the similarity in both the transaction space and the pattern space, thusthe high quality of the clustering is obtained. The extensive experiments verify that the quality of representative patterns obtained by FRSM is better than other previoustechniques focus on the similarity either in the transaction or the pattern space.3. The integrated algorithm InRSM mines the frequent representatives subgraphsdirectly from the graph databases. It is more efficient than the FRSM algorithm. InRSMalso considers the similarity in both the transaction space and the pattern space. Theextensive experiments verify that the effectiveness, efficiency, and scalability of ouralgorithms.
Keywords/Search Tags:Data mining, Frequent subgraphs, Maximal frequent subgraph, Representative patterns
PDF Full Text Request
Related items