Font Size: a A A

Research Of Mining Key Technology For EMU Fault Data

Posted on:2015-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:M H HouFull Text:PDF
GTID:2298330434450274Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present, China has mastered traction, braking, Steering and other core technologies of high-speed trains, also has accumulated a mass of high-speed trains failure data with the help of sensors technology, advanced data collection instruments and computer storage devices. These data is increasing at magnitudes of TB. Traditional data mining solutions are not qualified for the analysis task of such mass data set. The emergence of cloud computing brings a new research direction for data mining because of its powerful computing capacity and huge storage capacity. The Hadoop is a relatively mature open-source framework for its reliable, scalable, distributed computing, allowing for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop has been widely used in various fields for big data processing. This paper, taking classical Apriorlalgorithm as example, designs and implements a set of data mining solution based on Hadoop, and describes the solution in detail.This paper comprises of the following parts:(1) the Hadoop core technology and its operation principle is analyzed, including distributed computing framework MapReduce and distributed file system HDFS.(2) propose a basic framework of data mining system based on Hadoop, and briefly describe each functional module of the system.(3) realize parallelization improvement of Apriorlalgorithm on Hadoop, and execution flow of improved algorithm in MapReduce is given.(4) Analyse the collected EMU complex equipment data, through the mining results analyse all kinds of conditions and their relationship when EMU failure occurs.(5) using the EMU failure data set, carry out single machine test and cluster test for the improved algorithm on Hadoop platform, and analyze result from the aspects of efficiency and scalability of the algorithm. The result of the experiment validates that the improved algorithm has a good speedup, portability, and high efficiency and the data mining system designed in this paper satisfies the specific requirements.Research content of this paper provides references to constantly improvement for the data mining algorithm, and has important and practical theory and economic value to achieving EMU proactive maintenance techniques, improving EMU safety and operational efficiency and reducing maintenance costs.
Keywords/Search Tags:EMU, Data Mining, hadoop, Apriori Algorithm
PDF Full Text Request
Related items