Research Of Frequent Itemsets Mining Algorithm Based On MapReduce Calculation Model

Posted on:2016-10-16

Degree:Master

Type:Thesis

Country:China

Candidate:S J Liu

Full Text:PDF

GTID:2298330467487309

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Along with the rapid development of Internet and Use the crowd isbecoming more and more generalized. Many Internet services companies haveTB or more grades of huge amounts of data every day. The Big Data, CloudComputing, Data Mining etc., become hot topics in the present day.Data miningalgorithm is widely used in different fields.In this thesis, we modified the tradition Apriori algorithm by improving theexecution efficiency, since Aprori algorithm confronted with an issue that thecomputation time increases dramatically when data size increases. When thetransactions database is big and a single host has been unable to afford a largenumber of operations, use more computers connected into a super computer tomake operation more quickly is very important. There have some parallel Apriorialgorithms based on MapReduce framework been proposed. By using the Hadoopand MapReduce framwork, the trasaction records will be saved on differentcomputers and multiple computers will be used to do parallel computing.Theadvantages to improve the operational efficiency of the Apriori algorithm is moreand more obvious.In this thesis, we propose two algorithms,"A High Efficient FrequentItemsets Mining Algorithm Based on MapReduce Calculation Model" and"Improvement of Apriori Algorithm Key/Value pair based on Hadoop Frameworkwith TID". By using the characteristics of MapReduce framework in Hadoopplatform,and the parallel computing ability of Multiple computers, to improve thetraditional Apriori algorithm execution mode, in order to achieve betteroperational efficiency.In this thesis, by using the AprioriTID algorithm to do pre-processing onoriginal data, we present an improved reformative Apriori algorithm that uses thelength of each transaction to determine the size of the maximum merge candidate itemsets. By reducing the production of low frequency itemsets in Map function,memory exhaustion is ameliorated, greatly improving execution efficiency.Otherwise, by changing the storage type of key/value, the memory footprint andthe communication cost of each node during each map-reduce phase will bereduced, thus to improve the efficiency of the overall operations.

Keywords/Search Tags:

Big Data, Apriori, Hadoop, MapReduce

PDF Full Text Request

Related items

1	MapReduce-based Graph Mining Research
2	Research On Parallel Data Mining Algorithms Based On Hadoop
3	Research On A Parallel Data Mining Algorithm Apriori
4	The Study Of The Improvement And Transplantation Of Apriori Algorithm Based On Hadoop
5	Research On Association Rules Algorithm Based On Hadoop
6	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
7	The Improved Apriori Algorithm Based On Hadoop Calculation Model
8	Research And Improvement Of Apriori Algorithm Based On Hadoop
9	Research Of Parallel Apriori Algorithm Based On MapReduce Model
10	Research And Application Of Association Rules Algorithm Based On MapReduce