Font Size: a A A

Association Rules Incremental Updating Research And Application Based On MapReduce

Posted on:2015-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhengFull Text:PDF
GTID:2298330467987023Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, Web information is increasing explosively, how to efficiently obtain effective information becomes popular, in which the association rules is an important research topic in the data mining field. However, the traditional association rule algorithms have been unable to meet the requirement of the data growth in data mining. Thus the real-time and efficient association rule updating is significant for trend analysis, decision-making and information recommendation.Most of exiting incremental updating methods are evolved from the traditional association rule algorithms, they are able to solve the problems, such as distributed computing and parallel I/O communication, but they still can’t solve the problem how to remove redundant computation for achieving the efficient computation. Motivated by this, we propose a parallel association rule algorithm. As the data increases rapidly, the proposed algorithm integrates the new data items and exiting data items uses the paralleling calculation to tradeoff the memory consumption and the computation cost, which aims to remove the redundant computation. The work of this dissertation mainly focuses on the following aspects:(1)We proposed an incremental association rules updating algorithm based on MapReduce. In order to avoid huge memory overheads while storage and reading, the proposed algorithm saves the Hash map modes of frequent subtrees. Meanwhile, our algorithm combines the parallel computing ability of MapReduce, it removes redundant computation and generate independent data for parallel computation, which can efficiently update association rules. We introduce the parallel design of our the algorithm, and implementation in Hadoop.(2)We give the design of our Web log data mining system in detail. In the platform of Hadoop, out System has the following function:data collection, data update, data preprocessing, data statistics and analysis, incremental updating of association rules, and the results showing, etc. Meanwhile, we propose an parallel preprocessing algorithm for Web data in the prototype system. Our algorithm can reduce the scanning number in the updating of association rules. Finally, our system shows that the incremental updating association rules algorithm based on MapReduce can more effective in data processing and application。...
Keywords/Search Tags:Association rules, Incremental update, Map/Reduce Pattern, Web Log datamining
PDF Full Text Request
Related items