Research For Association Rules Algorithm On Big Data

Posted on:2017-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:W Zhou

Full Text:PDF

GTID:2348330509962617

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet has led to the rapid growth of data, many enterprise data produced every day has reached TB or PB grade level. Facing with such huge data sets, data mining process is restricted by complex algorithms, operating platform and other issues, making it impossible to achieve the desired effect. Association rules is an important branch in Data mining field, it has a wide range of applications. With the rise and development, Hadoop as an open source platform provides a new direction for the solution for data mining algorithm. Hadoop has great fault tolerance and scalability, it can solve the problem in some inspect during association Rules algorithm execution: heavy calculation burden and I/O overburdened.This paper focuses on two classical association rules: Apriori algorithm based on the level of the data set and EClat algorithm based on vertical data set, Two algorithms to achieve their improved algorithm based on the Hadoop platform. Apriori algorithm which requires iterative calculations and repeatedly scan the database, dose not adapt to the Hadoop platform, we propose a strategy by reducing the data set and determine the highest order Km, pruning based on 2-frequent itemsets to generate a 3-Km itemsets. For Eclat algorithm, we present both Hadoop platform parallel algorithm: D-MREclat algorithm and A-MREclat algorithm. In the first algorithm, data is cut into small blocks according to the scope of data collection, calculation of the number of the intersection operation is reduced during the next phase, to improve the operating efficiency. The second algorithm introduced Apriori thought, based on the search space divide the prefixes, achieve Eclat algorithm parallelization, this method is more efficiency in handling large data sets. A large number of candidate sets are generated during operation Eclat algorithm, using too much memory. In this paper, the computation results will be compressed by using a compression method for storing data. This approach can reduce the network traffic, to improve the operating efficiency.Finally, achieve three kinds of improved algorithms parallel association rules. Use Hadoop cluster, using different types, size of the data set to test the performance of the algorithm. Experimental results show that the improved algorithm exhibits a better performance.

Keywords/Search Tags:

Hadoop, MapReduce, association rules, frequent itemsets, Parallel Computing

PDF Full Text Request

Related items

1	Research And Application Of Association Rules Algorithm Based On MapReduce
2	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
3	Research On The Method Of Condensing Association Rules
4	The Parallel Association Rules Algorithm Based On Mapreduce In The Application Of Community Analysis Research
5	Parallel Frequent Itemset Mining Based On MapReduce
6	Research On Top-K Frequent Itemsets Datamining Algorithm
7	Parallel Association Rules Algorithm Based On Hadoop
8	Improved Algorithm For Parallel Association Rules Mining
9	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets
10	Research On Parallel Frequent Itemsets Mining Algorithm