Font Size: a A A

Parallel Dynamic Association Rules Mining In Big Data Environment

Posted on:2016-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:M F TianFull Text:PDF
GTID:2308330464474211Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, the big data technology is playing a huge role in every level and field of life,with it status improving continuously. Integrating and making good use of big data has become the key to a successful career for governments and economies. And as a new type of strategic resources, the big data’s features are obvious: large size, variety, produce fast, high requirement of real-time and low density of value. Which means that the traditional serial single-machine algorithm cannot meet the demands of big data processing, and the multi-machine oriented, parallel, distributed data processing is becoming more and more important.Data mining is the key technology to find potential value in data. As a major concern in the date mining field, the association rules mining technology has attract much attention.Traditionally, the association rule technology think that rules are static, never change once excavated. Rules in reality, however, are changing with time. To better describe the characteristics of such rules, this thesis introduces the support vector( SV) and the confidence vector( CV), and focuses on the research of parallel hierarchical dynamic association rule mining.Firstly, with the study of parallel association rule mining algorithm and dynamic association rule mining algorithm, the article then put forward a parallel algorithm for mining association rules with high efficiency—parallel hierarchical association rule mining algorithm based on partitioning(PHARM). The basic idea of the algorithm is to divide the whole dataset D into several disjoint child-dataset randomly; each child-dataset can be divided into smaller datasets continuously. After that, it begin to dig up local frequent itemsets parallelly and layeredly. Count the actual support of every candidate itemset and determine the global frequent itemsets in the last scanning. Model analysis and experiment show that the algorithm is of high efficiency, with obvious advantage in large amount of data.Secondly, through the research of two stage ITS algorithm, this thesis applies the parallel hierarchical mining idea to the first stage of dynamic association rule mining. Then it proposes two efficient algorithms based on the ITS algorithm for dynamic association rule mining: The parallel hierarchical dynamic association rule mining based on division(PDMD)algorithm and the parallel hierarchical dynamic association rule mining algorithm based on building candidate matrix(PDMC). To find global frequent itemsets L and its frequency vector FV, the former needs to scan the whole dataset twice to get itemsets L and its frequency vector FV from local frequent itemsets respectively. The latter uses the localfrequent itemsets to build a candidate matrix, generating the global frequent itemsets L and its frequency vector FV without scanning the database again.
Keywords/Search Tags:Big Data, Data Mining, Parallel Algorithm, Dynamic Association Rule
PDF Full Text Request
Related items