Research On Mining Frequent Itemsets In Cloud Computing Environment

Posted on:2015-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z G Zhang

Full Text:PDF

GTID:2298330467464517

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Frequent Itemsets Mining is a fundamental and important research topic in data mining. It can be used in many application areas, such as electronic commerce, web query analysis, network intrusion detection and disease diagnosis. With the development of information society, the Internet industry is changing quickly, and the data produced by human society is increasing in an explosive way, the "Big Data" research topic emerges consequently. How to store and process the data is a problem demanding prompt solution. The emergence of cloud computing provides a solution to these problems. In this thesis, an in-depth study about how to mine frequent itemsets in the cloud computing environment is carried out. The main innovative contributions are achieved as follows:1. Parallel frequent itemsets mining algorithm MRApriori and TRJ_MRApriori based on iterative Map/Reduce are proposed. In each iterative procedure, MRApriori gets the candidate frequent k-itemset in every computing node firstly and then merges the results to get the frequent ones. In order to improve the efficiency of MRApriori, the improved algorithm TR_MRApriori is proposed. TR_MRApriori records the identifiers of those transactions which are useful in the next iterative procedure, and reduces the number of transactions which need to be scanned. Experimental results show that algorithm TR_MRApriori is more efficient than some other algorithm.2. Parallel frequent itemsets mining algorithm FPPM which based on algorithm FP-Growth is presented. Firstly, the local frequent pattern trees of each computing node are built, these local trees are mined to get local frequent itemsets, and then local frequent itemsets are merged into global frequent itemsets. After the statistics of the local frequent itemsets, a complete result is got. The experimental results show that our parallel algorithm FPPM has high scalability, and overcomes the problem of massive communication between computing nodes.3. Parallel frequent itemsets mining algorithm SBPFP which based on algorithm FP-Growth is introduced. SBPFP gets the running of each item from the sample firstly, and use the time as a weight to denote the working load. Then, separate the items into groups according to their weight, and distribute the data to computing nodes in a balanced way. Finally, in every computing node, SBPFP gets the frequent itemsets of items of the corresponding group. Experimental results show that algorithm SBPFP is effective and scalable, and the load-balance strategy of SBPFP outperforms the other method.4. Parallel frequent itemsets mining algorithm MREclat which based on algorithm FP-Growth is put forward. MREclat firstly converts the horizontal database into a vertical one, and then, redistributes the converted database to distributed computing nodes. It takes the load-balance into consideration while redistributing the converted database. In this paper, the idea of MREclat is introduced and the performance of the algorithm is studied. The experimental results show that algorithm MREclat has high scalability and good speedup.

Keywords/Search Tags:

frequent itemsets, parallel mining algorithm, data mining, Map/Reduce, cloud computing

PDF Full Text Request

Related items

1	Research And Implementation On Efficient Parallel Frequent Itemsets Mining Algorithm Based On Spark
2	Research On Frequent Itemsets Mining Parallel Algorithm
3	Research On Multi-stream Frequent Item Set Mining Algorithm
4	Research On Parallel Frequent Itemsets Mining Algorithm
5	The Research Of Cloud Frequent Itemsets Mining Algorithms Which Based On Sample
6	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
7	Research On Frequent Itemsets Mining Algorithm Based On Matrix
8	Frequent Itemsets Mining Algorithm And Its Application In Data Flow
9	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
10	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application