Font Size: a A A

Research On Mining Frequent Itemsets In Cloud Computing Environment

Posted on:2015-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z G ZhangFull Text:PDF
GTID:2298330467464517Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent Itemsets Mining is a fundamental and important research topic in data mining. It can be used in many application areas, such as electronic commerce, web query analysis, network intrusion detection and disease diagnosis. With the development of information society, the Internet industry is changing quickly, and the data produced by human society is increasing in an explosive way, the "Big Data" research topic emerges consequently. How to store and process the data is a problem demanding prompt solution. The emergence of cloud computing provides a solution to these problems. In this thesis, an in-depth study about how to mine frequent itemsets in the cloud computing environment is carried out. The main innovative contributions are achieved as follows:1. Parallel frequent itemsets mining algorithm MRApriori and TRJ_MRApriori based on iterative Map/Reduce are proposed. In each iterative procedure, MRApriori gets the candidate frequent k-itemset in every computing node firstly and then merges the results to get the frequent ones. In order to improve the efficiency of MRApriori, the improved algorithm TR_MRApriori is proposed. TR_MRApriori records the identifiers of those transactions which are useful in the next iterative procedure, and reduces the number of transactions which need to be scanned. Experimental results show that algorithm TR_MRApriori is more efficient than some other algorithm.2. Parallel frequent itemsets mining algorithm FPPM which based on algorithm FP-Growth is presented. Firstly, the local frequent pattern trees of each computing node are built, these local trees are mined to get local frequent itemsets, and then local frequent itemsets are merged into global frequent itemsets. After the statistics of the local frequent itemsets, a complete result is got. The experimental results show that our parallel algorithm FPPM has high scalability, and overcomes the problem of massive communication between computing nodes.3. Parallel frequent itemsets mining algorithm SBPFP which based on algorithm FP-Growth is introduced. SBPFP gets the running of each item from the sample firstly, and use the time as a weight to denote the working load. Then, separate the items into groups according to their weight, and distribute the data to computing nodes in a balanced way. Finally, in every computing node, SBPFP gets the frequent itemsets of items of the corresponding group. Experimental results show that algorithm SBPFP is effective and scalable, and the load-balance strategy of SBPFP outperforms the other method.4. Parallel frequent itemsets mining algorithm MREclat which based on algorithm FP-Growth is put forward. MREclat firstly converts the horizontal database into a vertical one, and then, redistributes the converted database to distributed computing nodes. It takes the load-balance into consideration while redistributing the converted database. In this paper, the idea of MREclat is introduced and the performance of the algorithm is studied. The experimental results show that algorithm MREclat has high scalability and good speedup.
Keywords/Search Tags:frequent itemsets, parallel mining algorithm, data mining, Map/Reduce, cloud computing
PDF Full Text Request
Related items