Font Size: a A A

Mining Of Maximal Frequent Item Sets Based On AFOPT

Posted on:2015-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2268330428465490Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the high-speed development of information industry, especially the Internet industry.people’s ability to obtain and store data continuously improve,and the data stored in the database is growing exponentially. But in these huge amounts of data, for people to have a valuable knowledge of the real decision making is relatively scarce, and association rule mining is used to reveal the data set different item or attribute, to find the valuable relationships and connections between multiple attributes.Maximum frequent item contains all the frequent items,takes up less memory space. Because of only need mining maximum frequent items,it can effectively reduce the number of recursion and memory applications, and some applications of data mining are also just need to get the maximum frequent items thus maximum frequent items mining research has important significance.Now in the face of large-scale dense data sets, the superset check gradually becomes one of the most time consuming steps in the operation of maximum frequent items mining algorithms,and becomes the bottleneck of algorithm efficiency.And the existing maximum frequent itemsets mining algorithms are mostly based on FP-tree model for spatial search tree traversal, in a top-down traversal strategy efficiency is not high. For these two problems, on consulting a large number of relevant papers and documents at home and abroad, this paper improved the superset checking method algorithm based on projection, puts forward the maximum frequent items algorithm A-MFI based on AFOPT-tree, and on the basis of achieved realize distributed implementation of A-MFI algorithm on Hadoop platform.The summary of work:1.First of all, this paper introduced the theory, characteristics and the mainstream algorithm of data mining, association rule mining and maximum frequent items mining,and relevant knowledge of cloud computing and Hadoop cloud computing platform.2.1n view of the problem of existing maximum frequent items mining algorithmsusing 3.FP-tree in top-down traversal policy efficiency not high, this article adopts AFOPT-tree model to construct the spatial search tree.For the problem to ascension the efficiency of superset checking method, this paper puts forward optimized superset checking method based on projection, AFOPT-tree model is adopted to modify the traditional MFI-tree, superset checking method based on projection for MFI-tree traversal of the bottom-up model change for top-down traversal, and join in the MFI-tree a list domain between the same data item sets, promote foresight and pruning efficiency. On the basis of these improvements, based on AFOPT-tree this paper is proposed A-MFI maximum frequent items mining algorithms, and adopting different items of data set of experiments, the algorithm in superset detection algorithm compared with the similar algorithm is verified the superiority of the optimization and overall efficiency.4. For facing massive data sets, single maximum frequent items mining algorithm efficiency improve limited,on the basis of in-depth study the related knowledge of cloud computing and Hadoop platform,develop improved distributed algorithm for A-MFI, realized the distributed implementation of mining maximum frequent items. Verified by the experiment, the maximum frequent items mining method of distributed has obvious improvement than single machine running efficiency in the face of large-scale dense data item sets.Finally, summarize the content of the full text, and points out the shortage of the existing research content and direction for future research.
Keywords/Search Tags:maximal frequent item set, superset checking, maximal frequent item setsprojection, distributed mining
PDF Full Text Request
Related items