Font Size: a A A

Distributed Association Rule Mining Method And Application Research

Posted on:2016-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:J K ZhengFull Text:PDF
GTID:2348330503986898Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, online shopping has become more and more popular, the recommendation algorithms that recommend suitable products to users is becoming increasingly important.Frequent patterns are patterns that appear frequently in the data set, which can be used as a part of the recommendation system. This research topic is based on the association rule algorithm, which combines with other machine learning algorithm to form a personalized recommendation system based on Baidu Tieba us ers. Our aim is to generate the rules by optimizing the association rule algorithm under distributed deployment, and then generate the appropriate recommendation information by processing these rules.Because the constraints of CPU performance, memory size and other factors in single-core computers, the association rules algorithm will encounter the following two bottlenecks. First, single point storage, the local data structure generated by the transaction data set may lead to the overflow of memory; Secon d, low supports, when the data volume of the original data set and the number of attributes column is too large, and the degree of support is low in the algorithm settings, it may result in that the algorithm output rule set space is much larger than the o riginal data set. So we use the open source distributed computing framework to deploy association rules algorithm. The traditional distributed association rule mining algorithm is constrained by high correlations among data on different nodes, which leads to the high communication cost. In this paper, the PFP-Growth algorithm is used as the target algorithm. Through data slicing, the PFP-Growth algorithm ensures that the nodes do not need to carry out the data exchange in the process of frequent pattern generation.In this research, the improved contrast algorithm is based on the distributed PFP-Growth algorithm. The main optimization direction is that the PFP-Growth algorithm makes the data on each node unrelated after the data slicing, but the task load is not uniformly allocated which makes execution time longer than the actual execution time of the algorithm.In the recommendation system, we do not use the originally generated frequent itemsets directly. Instead, we generate the frequent itemsets simultan eously with the frequent close itemsets and maximal frequent itemsets; and then, when the strong association rules are reduced, the strong association rules with low reliability or high redundancy are filtered through some filtering indexes. Finally, the f inal recommendation results are obtained by clustering the remaining association rules and selecting the representative rules.After experiments on several slice schemes to compare different factors such as time, it is possible to compare the effectiveness of the proposed scheme. But due to the diversity of the selected distributed computing framework and the principle of computation, different scheduling schemes may lead to time difference. The research compare the experimental results in theory. This rese arch uses distributed association rule algorithm to apply to the real Baidu Tieba user data, we complete the recommendation for individual users. We add some ordered filtering recommendation rules to the algorithm, which makes the recommendation information more concise.
Keywords/Search Tags:frequent itemsets, association rules, distributed, recommendation system
PDF Full Text Request
Related items