Research Of Parallelized Distributed Association Rules Mining Algorithm Based On Hadoop

Posted on:2018-08-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yu

Full Text:PDF

GTID:2348330515996657

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of science and technology in recent years,people's daily life through the computers,mobile phones and other terminal platform for a series of acts will produce a lot of data.Not only data,but also access to data is rapidly growing.In the context of the large data age,all kinds of data are growing faster than imagination,large-scale network of business data generated every day can reach hundreds of TB or even PB level.How to obtain information from such a large database quickly,efficiently and accurately is one of the hotspots in computer science research.Parallelized distributed data mining algorithm is an important method for the analysis of massive data that may exist and cross-region,and has very important research significance and practical value.Association rules data mining algorithm is one of the classical data mining algorithms.It has strong learning value and reference value.The traditional association rule mining algorithm exchange the candidate sets,this cause much more network exchange under the premise of parallelization.However,in the context of large amounts of data,the generation of candidate items will be a sudden increase in the situation,easy to load the machine's memory,affecting the efficiency of the algorithm.In this paper,an optimization algorithm Y-IDA algorithm is proposed to complete the process of merging the count directly in memory,instead of the traditional method of outputting the candidate sets one by one to optimize the algorithm.At the same time,modify the Hadoop interface and change the Map Reduce Read the model,the use of the first set of frequent itemsets to clean the database,reducing memory consumption and CPU usage time,improve the efficiency of the implementation of the algorithm.The main work of this paper includes:1)Achieve the basic algorithm serial Apriori,for the follow-up parallel to lay the foundation;2)An optimized algorithm Y-IDA is proposed for the parallel Apriori algorithm.The algorithm completes the process of merging the counting in memory,replacing the traditional method of outputting the candidate sets one by one,and changing the traditional read mode of Map Reduce,The amount of traffic during the execution,and the candidate 1 item set after the cleaning of the data,remove the invalid data;3)On the Hadoop platform,the algorithm of association rule algorithm is implemented.Under the existing experimental conditions,the experimental scheme is proposed to verify that the Y-IDA algorithm is the same as the classical algorithm.In the time efficiency,memory consumption,disk read and write,CPU occupancy and other aspects of a detailed comparison.In this paper,through the Hadoop fully distributed platform,using data mining discrete test data to achieve,we can get the result: The improved algorithm can shorten the execution time,memory consumption,CPU occupancy,disk I / O read and write and can have a better performance,get the improved algorithm has the feasibility and universal significance of the conclusion.

Keywords/Search Tags:

Association rule mining algorithm, Parallelized data mining, Apriori, Hadoop

PDF Full Text Request

Related items

1	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
2	Research On Apriori Algorithm In Association Rlues Mining
3	Improvement For Apriori Algorithm Of Association Rule Mining
4	Association Rule Mining Algorithms, Data Mining Technology
5	Research And Application Of Apriori Algorithm In Association Rules Mining
6	Research On Web Text Mining Based On XML And Association Rule Mining Algorithm
7	Research On The System Of Drilling-Geology Design Based On Association Rules Mining
8	Research On Web Log Mining Based On Association Rule
9	Research On Association Rule Mining Algorithm For Bank Credit Analysis
10	The Research Of Association Rules Mining Algorithm