Font Size: a A A

Research And Improvement Of Association Rules Algorithm

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2348330488472881Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer and information technology,people can collect data with more convenient ways.The rapid growth of data makes people urgently need a technique to deal with these data. As such a technique,data mining has become the focus of attention, it is widely used in various industries. The main tasks of data mining are: classification and prediction?clustering?association rules?sequence analysis and anomaly detection. This paper mainly introduces the association rules among these main tasks.This paper analysis the Apriori algorithm and the FP-growth algorithm and researches them in detail. The main works are as follows:1. Based on the research on the Apriori algorithm, this paper implements the mining of frequent item sets of the algorithm in the MFC framework. The program can set different item numbers and the minimum support to achieve the mining of frequent item sets by connecting with Access database and give the time overhead of this mining process.2. Through the analysis of each step of the Apriori algorithm, an optimization method is presented and through case analysis and experiment results verify the optimization results. The optimized algorithm makes improvements through the following two ways:(1)for the connection step, the Apriori algorithm need to compare whether the two previous k-1 items of the k-frequent item sets are the same or not, durning which a large number of candidate item sets will be produced, and the time complexity greatly increased. The optimized algorithm adds the connection pretreatment step, reduces the numbers of comparison in the connection step and avoids generating many unnecessary frequent item sets;(2)for the support count step of the Apriori algorithm, the larger the width of the transaction, the more effective numbers of the transaction traverse of the candidate Hash tree. The optimized algorithm reduces the width of the transaction to effectively reduce the time overhead of support counts by adding the transaction pruning step.3.For the large data sets, The FP_tree of the FP-growth algorithm is very complex, furthermore, the algorithm requires the following steps frequently:(1)converse the FP_tree to the prefix path;(2)converse the prefix path to its condition FP_tree. This paper presents an optimization method based on the idea of compressed data set. The optimized algorithm generates a simple sub-FP_tree structure by compressing the data set before the mining process. For mining frequent item sets end with specific item sets, the optimized algorithm simplifies the mining procedures and improves the mining efficiency. Finally the analysis of examples and the results of experiments verify the optimization in time and space by comparing the two algorithms.
Keywords/Search Tags:Association rule mining, Apriori algorithm, FP-growth algorithm, Connection pretreatment step, Transaction pruning step
PDF Full Text Request
Related items