Font Size: a A A

Research Of Frequent Itemsets Mining Algorithm Based On MapReduce Calculation Model

Posted on:2016-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:S J LiuFull Text:PDF
GTID:2298330467487309Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the rapid development of Internet and Use the crowd isbecoming more and more generalized. Many Internet services companies haveTB or more grades of huge amounts of data every day. The Big Data, CloudComputing, Data Mining etc., become hot topics in the present day.Data miningalgorithm is widely used in different fields.In this thesis, we modified the tradition Apriori algorithm by improving theexecution efficiency, since Aprori algorithm confronted with an issue that thecomputation time increases dramatically when data size increases. When thetransactions database is big and a single host has been unable to afford a largenumber of operations, use more computers connected into a super computer tomake operation more quickly is very important. There have some parallel Apriorialgorithms based on MapReduce framework been proposed. By using the Hadoopand MapReduce framwork, the trasaction records will be saved on differentcomputers and multiple computers will be used to do parallel computing.Theadvantages to improve the operational efficiency of the Apriori algorithm is moreand more obvious.In this thesis, we propose two algorithms,"A High Efficient FrequentItemsets Mining Algorithm Based on MapReduce Calculation Model" and"Improvement of Apriori Algorithm Key/Value pair based on Hadoop Frameworkwith TID". By using the characteristics of MapReduce framework in Hadoopplatform,and the parallel computing ability of Multiple computers, to improve thetraditional Apriori algorithm execution mode, in order to achieve betteroperational efficiency.In this thesis, by using the AprioriTID algorithm to do pre-processing onoriginal data, we present an improved reformative Apriori algorithm that uses thelength of each transaction to determine the size of the maximum merge candidate itemsets. By reducing the production of low frequency itemsets in Map function,memory exhaustion is ameliorated, greatly improving execution efficiency.Otherwise, by changing the storage type of key/value, the memory footprint andthe communication cost of each node during each map-reduce phase will bereduced, thus to improve the efficiency of the overall operations.
Keywords/Search Tags:Big Data, Apriori, Hadoop, MapReduce
PDF Full Text Request
Related items