Research And Implementation On Efficient Parallel Frequent Itemsets Mining Algorithm Based On Spark

Posted on:2019-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:F Zhang

Full Text:PDF

GTID:2428330563992488

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid increase of data volume in real life,data mining has attracted great attention in various research fields,especially in the game field.Frequent itemsets mining is a very popular data mining technology and plays an important role in many important data mining tasks.However,with the rapid development of Big Data,people's demand for valuable information in data is increasing,for an example,people want to discover the growth path of user in large amounts of game data to give users a better gaming experience,but hardware conditions are unable to meet people's need for fast information mining.In other word,under the premise that hardware conditions and data volume can not be changed,the existing frequent itemsets mining algorithms can no longer satisfy people's desire for effective information within an effective time.Therefore,an efficient algorithm for parallel frequent itemsets mining is studied and implemented,which becomes an important direction in the field of data mining.An efficient parallel frequent itemset mining algorithm is proposed,named PNPFI.The algorithm is implemented based on the Prepost algorithm and Spark platform.PNPFI is implemented parallel and independence between nodes on the Spark platform,and it proposes a novel algorithm of N-lists intersection,it stops the process of N-lists intersection in advance through judging whether the result meets the threshold in advance,which greatly reduces the memory and time consumption.In order to further reduce some redundant process of N-list intersection,PNPFI proposes a new concept P-Subsume based on N-list.Through P-Subsume,PNPFI can be directly combine it with items to generate some frequent itemsets,without the intersection of N-lists,greatly reducing the algorithm runtime.In addition,considering the practicality of the algorithm,PNPFI proposes a load balancing strategy to partition transactions by predicting item loads so that the clusters achieve load balancing.The experimental results show that compared with the classical parallel algorithm and the recently proposed parallel algorithm,PNPFI shows a great advantage in terms of performance and memory overhead,with a maximum performance increase of 70% and an average increase of 39%;memory consumption can be reduced by a maximum of 90%,and can be reduced by 71% on average.

Keywords/Search Tags:

Data Mining, Frequent Itemsets Mining Algorithm, Parallel, Load Balance

PDF Full Text Request

Related items

1	Research On Multi-stream Frequent Item Set Mining Algorithm
2	Research On Parallel Frequent Itemsets Mining Algorithm
3	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
4	Research On Frequent Itemsets Mining Parallel Algorithm
5	Research On Mining Frequent Itemsets In Cloud Computing Environment
6	Research On Frequent Itemsets Mining Algorithm Based On Matrix
7	Frequent Itemsets Mining Algorithm And Its Application In Data Flow
8	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
9	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application
10	Research On Algorithm For Mining Frequent Itemsets Of Uncertain Data