Parallel Dynamic Association Rules Mining In Big Data Environment

Posted on:2016-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:M F Tian

Full Text:PDF

GTID:2308330464474211

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

At present, the big data technology is playing a huge role in every level and field of life,with it status improving continuously. Integrating and making good use of big data has become the key to a successful career for governments and economies. And as a new type of strategic resources, the big data’s features are obvious: large size, variety, produce fast, high requirement of real-time and low density of value. Which means that the traditional serial single-machine algorithm cannot meet the demands of big data processing, and the multi-machine oriented, parallel, distributed data processing is becoming more and more important.Data mining is the key technology to find potential value in data. As a major concern in the date mining field, the association rules mining technology has attract much attention.Traditionally, the association rule technology think that rules are static, never change once excavated. Rules in reality, however, are changing with time. To better describe the characteristics of such rules, this thesis introduces the support vector( SV) and the confidence vector( CV), and focuses on the research of parallel hierarchical dynamic association rule mining.Firstly, with the study of parallel association rule mining algorithm and dynamic association rule mining algorithm, the article then put forward a parallel algorithm for mining association rules with high efficiency—parallel hierarchical association rule mining algorithm based on partitioning(PHARM). The basic idea of the algorithm is to divide the whole dataset D into several disjoint child-dataset randomly; each child-dataset can be divided into smaller datasets continuously. After that, it begin to dig up local frequent itemsets parallelly and layeredly. Count the actual support of every candidate itemset and determine the global frequent itemsets in the last scanning. Model analysis and experiment show that the algorithm is of high efficiency, with obvious advantage in large amount of data.Secondly, through the research of two stage ITS algorithm, this thesis applies the parallel hierarchical mining idea to the first stage of dynamic association rule mining. Then it proposes two efficient algorithms based on the ITS algorithm for dynamic association rule mining: The parallel hierarchical dynamic association rule mining based on division(PDMD)algorithm and the parallel hierarchical dynamic association rule mining algorithm based on building candidate matrix(PDMC). To find global frequent itemsets L and its frequency vector FV, the former needs to scan the whole dataset twice to get itemsets L and its frequency vector FV from local frequent itemsets respectively. The latter uses the localfrequent itemsets to build a candidate matrix, generating the global frequent itemsets L and its frequency vector FV without scanning the database again.

Keywords/Search Tags:

Big Data, Data Mining, Parallel Algorithm, Dynamic Association Rule

PDF Full Text Request

Related items

1	Parallel Dynamic Association Rules Mining In Big Data Environment
2	Data Warehouse-based Association Rule Mining Algorithm
3	Research Of Algorithem Of Mining Association Rule Based On Data Warehouse
4	Research On Association Rule Mining Based On Adaptive Algorithm And Parallel Computing
5	Research On Updated Algorithm Of Parallel Association Rules
6	The Research On The Parallel Algorithm Of Association Rule Mining
7	Research On Association Mining Algorithms With Dynamic Database Based On Big Data
8	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
9	Research On The Parallel Mining Algorithms For Association Rules
10	Arithmetic Of Association Rules Mining Based On Dynamic Data