Font Size: a A A

Mining Association Rules Algorithm Analysis Based On Hadoop

Posted on:2016-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2308330473954444Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development in the field of information technology and innovation in the Internet, the big data research has become a hot problem. The association rules is an important method in data mining research, which plays a very important role in the reseach of data correlation. The key problem of their research is how to obtain frequent sets and how to find the correlation among different projects based on frequent sets. Hadoop cloud computing is the pop core of distributed computing framework, which has the advantages in high efficiency, scalability, and so on. Hadoop cloud computing has become one of the best computing models.This thesis proposed an improved Apriori’s Hadoop parallelization algorithm, which is based on the classical Apriori and Fp-Growth association algorithm, and the advantages and disadvantages running parallelly on Hadoop. Meanwhile, the Fp-Growth of Hadoop parallel algorithm is applied to the search engine. The main research contents are as follows:(1) The advantages of the improved Apriori algorithm is compression in the transaction, simplifying the candidate set generation, reducing the number of scanning areas. We proposed "0" and "1" as the terms to describe the transaction Boolean matrix model, and itruduced the weight dimensions to compress the matrix size of the transaction. We utilized dynamic pruning matrix and "and" operation to generate a candidate set. The improved algorithm is combined with Hadoop framework to achieve parallelism. Experiments show that the algorithm is suitable for large-scale data mining, and the algorithm has good scalability and effectiveness.(2) Based on the FP-Growth principle of Hadoop parallelization algorithms and search engines, we designed the search engines application scenario of the inproved algorithm by analyzing user behavior. We Realized web log analysis mining to Sogou laboratory by the inproved Fp-Growth of Hadoop parallel algorithm. Experiments show that the query words satisfying support and click on the link frequent sets prevalent in the log. Experiments show also with the increasing in number of nodes on Hadoop, the algorithm performance will be improved greatly.
Keywords/Search Tags:Data mining, Association rules, Hadoop, Apriori, Fp-Growth
PDF Full Text Request
Related items