Font Size: a A A

Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop

Posted on:2017-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhouFull Text:PDF
GTID:2308330482979316Subject:Mechanical Manufacturing and Automation
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the high speed EMU train in China, massive amount of historical maintenance and fault data have been accumulated at present. How to make use of data mining technology to mine useful knowledge from historical maintenance and fault data and provide effective decision-making support for the Electric Multiple Units (EMU) fault diagnosis and maintenance, has become an urgent requirement of application. Aiming at utilizing EMU trains’historical maintenance and fault data, from the perspective of guiding the EMU’s fault diagnosis, methods for association rules mining of mass engineering data have been researched in this thesis.The traditional association rules mining algorithms will meet the bottleneck in the process of data mining when dealing with mass and multi-dimensional data sets. In this thesis Hadoop is adopted as the basis technology to improve the traditional Frequent Pattern Growth (FP-Growth) algorithm and the traditional Apriori Algorithm to facilitate parallel data processing. Hadoop is an open-source distributed computing platform. The core parts of Hadoop are the Hadoop Distributed File System (HDFS) and the parallel programming framework-- MapReduce. Developers can conveniently develop distributed programs without understanding the inner architecture of Hadoop.In this thesis, the existing association rule mining algorithms and their disadvantages are analyzed. According to the requirements of EMU trains’ fault diagnosis, the FP-Growth algorithm and the Apriori algorithm are chosen as the basis algorithms to mine association rules of the EMU trains’mass fault data. Firstly, an improved algorithm for data mining is proposed by using the local frequent pattern tree instead of the global frequent pattern tree. This algorithm adopts parallel processing in every data processing steps. The frequent patterns search strategies are also improved. Secondly, an improved parallel Apriori multi-dimensional association rule mining algorithm is proposed, which uses the iterative method and realizes parallel processing in the process of mining candidate data sets. The efficiency in the process of mining association rules is greatly improved by the proposed algorithms and the computation space cost is saved effectively, and the mining results keep the relationships between the fault information and the state information well, and in the meanwhile it removes the invalid rules reasonably.In this thesis, the improved algorithms are used in the process of association rules knowledge acquisition, which are hidden in the EMU trains’historical maintenance and fault data. An EMU Trains’mass maintenance and fault data processing prototype platform is also designed and implemented accordingly, which includes the authentication module, the data transmission module, the data mining module and the file management module, etc. Based on analysis and experimental tests, the improved parallel algorithms proposed in this thesis have the characteristics of fast speed, high efficiency and accuracy in the process of knowledge acquisition for EMU’s fault diagnosis.
Keywords/Search Tags:Association rule, Data mining, Parallel FP-Growth algorithm, Parallel Apriori algorithm, Hadoop
PDF Full Text Request
Related items