Font Size: a A A

The Research On The Algorithms Of Mining Distributed Maximal Frequent Patterns

Posted on:2012-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2178330341950150Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The association rule mining is a very important problem in data mining. The issue of mining frequent patterns plays a crucial role in association rule mining. Many algorithms have been proposed for mining frequent patterns. Because of the time-consuming in mining frequent patterns, mining maximal frequent patterns have been proposed to improve the mining efficiency. The maximal frequent patterns contain all the frequent patterns and there are applications where the maximal frequent patterns are adequate. Moreover, with the development of the technologies of the network and distributed database, it is important to find algorithms of mining maximal frequent patterns in a distributed way. In addition, in actual applications, users may adjust the the minimum support threshold dynamically to find useful information, and with the continual emergence of massive data, thus high efficiency algorithms for updating mining maximal frequent patterns need to be studied. The main contributions of the paper are listed as follows.Firstly, an algorithm DMFP for mining global maximal frequent patterns in a distributed way is proposed in this paper. In order to compress the transactions in local database, the algorithm employs the improved frequent pattern tree(HSFP-tree) to store the dataset. The algorithm makes full advantage of the characteristic of HSFP-tree and uses depth-first search strategy to mine local maximal frequent patterns, without generating candidate patterns and scanning databases. At last, global maximal frequent patterns are obtained from all of local maximal frequent patterns through the communication of different sites. Consequently, the complexity of space and time that the algorithm used is reduced obviously in the procedure of data mining. The experiments evaluated on a number of real and synthetic databases show that our algorithm outperforms previous method in most cases.Secondly, we present a new updating algorithm UDMFP for mining global maximal frequent patterns when the minimum support threshold is changed by customer. The algorithm can efficiently mine global maximal frequent patterns under new minimum support threshold, because it makes full use of the information provided by previous results. From the experimental results, we can conclude that the algorithm is highly efficient to the updating mining problems.Finally, we present an incremental algorithm CDMFP for mining global maximal frequent patterns when the database is changed. The algorithm just needs to scan the added records in database, and it makes use of the former HSFP-tree and the mined results. Thus the algorithm can efficiently mine new global maximal frequent patterns when the database is changed. From the experimental analysis we can conclude that the algorithm is especially efficient to the application of updating mining global maxial frequent patterns.
Keywords/Search Tags:Distributed Association Rules, Global Maximal Frequent Pattern, Updating Mining, Frequent Pattern Tree
PDF Full Text Request
Related items