Key Techniques Of Map Database Frequent Pattern Mining

Posted on:2013-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:S Qu

Full Text:PDF

GTID:2248330374954328

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Mining frequent patterns from the graph database has wide range of applicationssuch as Cheminformatics, Computational Biology, WEB information management,Social network analysis. In this paper, we focus on the techniques of finding frequentsubgraphs from graph databases in order to address the problem, such as scalability, theexponential number of frequent patterns, high complex, redundant, etc. We introduce arandomized approach RMPM for mining maximal frequent patterns, and propose twoalgorithms, FRSM and InRSM which mines a summary representation of the set offrequent graphs. The main contributions of this paper are as follows:1. Traditional or na ve randomized algorithms have a high effectiveness in frequentpatterns mining, but they will obtain the same pattern many times, thereby inevitablycauses multiple and useless subgraph isomorphism computing. In order to improve theefficiency of the existing randomized algorithms, RMPM can use the patterns minedalready to generate maximal frequent subgraphs. The extensive experiments on real andsynthetic datasets verify the effectiveness and efficiency of our algorithms; experimentalso shows that RMPM offers very good scalability to large graph databases. Therandomized algorithms that we proposes is generic and is equally applicable to differentkinds of patterns, such as free trees, item set, etc. The extensive experiments of miningmaximal frequent free trees on real graph databases show that the RMPM algorithm isgeneric.2. The FRSM algorithm aims to mine a relative small set of representativesubgraphs to summarization of the large set of frequent subgraphs. In the first stage ofFRSM, the algorithms RMPM is used to mine a set of frequent subgraphs; in the secondstage, some form of clustering is adopted to obtain the representative patterns set.FRSM consider the similarity in both the transaction space and the pattern space, thusthe high quality of the clustering is obtained. The extensive experiments verify that the quality of representative patterns obtained by FRSM is better than other previoustechniques focus on the similarity either in the transaction or the pattern space.3. The integrated algorithm InRSM mines the frequent representatives subgraphsdirectly from the graph databases. It is more efficient than the FRSM algorithm. InRSMalso considers the similarity in both the transaction space and the pattern space. Theextensive experiments verify that the effectiveness, efficiency, and scalability of ouralgorithms.

Keywords/Search Tags:

Data mining, Frequent subgraphs, Maximal frequent subgraph, Representative patterns

PDF Full Text Request

Related items

1	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
2	Research On Mining Maximal Frequent Subgraphs Approach
3	Algorithms For Mining Frequent Subgraphs And Its Applications In Biological Network
4	Research On Algorithms For Maximal Frequent Subgraph Mining
5	Research Of Neglected Conditions Defects Discovery Method Based On Graph Data Mining
6	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
7	Research And Application Of Data Mining Algorithm Based On Graph Pattern
8	Research On Data Mining Technology For Very Large Databases
9	Frequent Subgraph Mining And Its Application In Compound Property Prediction
10	Research On Algorithms For Mining Maximal Frequent Itemsets