Research On Rapid Mining Algorithm For Massive Data

Posted on:2013-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:X F Zhu

Full Text:PDF

GTID:2248330377455256

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining is a procedure that extract information and knowledge which is implicit and not known in advance but potentially useful from a large number of incomplete, noisy, fuzzy, random data. With the rapid development of IT, people have accumulate more than hundreds of TB data. How to extract useful information from vast amounts of data must be addressed. For massive data mining, distributed parallel processing and incremental processing are effective solutions.Cloud computing is an emerging computational model of the shared infrastructure, it specializes in large-scale data and large-scale computing, it is the extension and expansion of distributed computing. Parallel and distributed is the key to cloud computing. In this thesis, combination with cloud computing, taking the incremental mining of association rules as the starting point, we put forward new ideas for rapid mining of massive data.This thesis describes the definition, functions, steps and challenges of data mining, analyzes the association rules mining algorithm. We also describes the concept, features, form and key technologies of cloud computing, and focus on analysis of Hadoop Distributed File System HDFS and the realization of the principle of parallel programming model MapReduce of the typical cloud computing platfonn Hadoop. The research focuses on the parallel mining algorithm of large frequent itemsets in association rules mining. We propose a rapid association rules incremental mining algorithm based on the cloud computing, we named it as C-FUP. In order to improve the efficiency of the parallelization, we improve the data set allocation method of HDFS and design a method named DAMBNP that dataset is allocated according to the calculation performance of heterogeneous nodes in cluster. From analyzing the performance of Hadoop, we find the Hadoop has the problem that the capacity of processing a large number of small files is insufficient, so we design the method for solving this problem.In addition, we design experiments to test the effect of the proposed algorithm and method, and the experimental results show that C-FUP algorithm does well in association rules incremental mining of massive data and has good scalability and expansibility. DAMBNP can effectively improve the efficiency of C-FUP algorithm on the cloud computing platform.We have been done useful work in the massive data rapid mining.

Keywords/Search Tags:

Massive Data, Incremental Mining of Association Rules, Cloud Computing

PDF Full Text Request

Related items

1	Research And Application Of Incremental Association Rules Algorithm Based On An Improved FP-tree
2	The Research And Application Of Data Mining In Mining Rules Of Medical Diagnosis
3	The Research And Implementation Of Parallel Association Rules Algorithm Based On Cloud Environment Data Mining
4	Research And Application For Association Rules Mining Based On Distributed Computing
5	Research And Improvement Of Algorithm For Incremental Updating Association Rule In Retail Business Intelligence System
6	Research On Incremental Updating Association Rules Mining Based On Apriori Algorithm
7	Parallel Association Rules Algorithm Based On Hadoop
8	The Research And Application Of Association Rules Incremental Mining Algorithm
9	Research On Massive Data Mining Algorithm Based On Cloud Computing Cotton Storage
10	Research About Data Mining Technologies Based On Cloud Computing