Font Size: a A A

Association Rules Mining And Applications On Clusters

Posted on:2018-12-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L XunFull Text:PDF
GTID:1318330536467899Subject:Industrial Engineering
Abstract/Summary:PDF Full Text Request
Big data has spawned the rapid development of the various walks of life.Various fields are innovating at an unimaginable speed and present new products,technologies,services and development patterns.The strategic significance of big data is not data resource itself,but to improve the"processing capacity" of the data resource and to achieve data "value-added"through "processing".Data mining is an effective means to realize knowledge discovery from big data and the use of data mining technology can understand the deep value behind big data.As a major research content in the fields of data mining,association rules can effectively find interesting association information among a large number of data items when association functions or models of data can not be determined.The existing association rule mining algorithms are not suitable for big data analysis and processing due to their high spatial and temporal complexity and I/O cost.In this dissertation,we make full use of the powerful data processing capability of MapReduce cluster system to study the association rules mining methods and performance optimization technologies of big data.Finally,we apply the association rule mining algorithms to the cold roll processing quality analysis.The main research tasks are given as follows:(1)Two parallel frequent itemsets mining algorithms on Hadoop cluster environments—FiDoop and FiDoop-HD,are proposed.FiDoop makes full use of MapReduce to avoid the recursive establishment of the conditional model bases.Thus,the efficiency of parallel mining is improved.FiDoop-HD,a FiDoop's extended algorithm,can be effectively adapted to high-dimensional datasets by reducing the decomposition cost of itemsets.The feasibility and validity of the parallel algorithms are verified by experiments on a Hadoop cluster.(2)Addressing the data non-local problem in parallel frequent itemsets mining algorithms(e.g.,FiDoop),a data partitioning strategy,FiDoop-DP,is proposed.The strategy adopts the Voronoi diagram and LSH technology to effectively reduces network transmission and mining cost by dividing high correlation data into the same data partition as much as possible.The experiments verify the validity of the data partitioning strategy on a Hadoop cluster.(3)A parallel frequent itemsets mining algorithm based on Spark memory computation is proposed.The algorithm takes full advantage of Spark clusters that support iterative data processing.A novelty node computation prediction model is proposed to balance computing load among nodes.The effectiveness of the proposed algorithm is verified by experiments on a Spark cluster platform.(4)A prototype system for Cold Roll processing quality analysis is designed and implemented in a cluster environment.Based on the background of quality analysis requirement of Cold Roll processing for a steel enterprise,we develop the prototype system of Cold Roll processing quality analysis combining with the above frequent itemset mining algorithms and data partitioning strategy.Data preprocessing,software architecture,and the module function are detailed.The results show that the prototype system can provide a new solution for manufacturing enterprises to carry out a key processing quality control.
Keywords/Search Tags:Big data, Intelligent manufacturing, Cluster systems, Association rules, Cold Roll, Processing quality
PDF Full Text Request
Related items