Research And Application Of Data Mining Outlier Detection Algorithm In Power Equipment Fault Detection

Posted on:2018-07-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Feng

Full Text:PDF

GTID:2382330518995558

Subject:Electronics and Communications Engineering

Abstract/Summary:

We live in an era of explosive data growth,people produce huge amount of data in daily social life,while the social economy and Internet obtain rapid development.Therefore,the purpose of data mining is to extract valuable information from these massive data,and in recent years,the rise of cloud computing provides a new field of development to data mining theory.However,the electric power industry plays a very important role in the national economy as basic industry to support the national economy,so the demand of data mining for electric power data is extremely urgent.But the research on data mining and cloud computing application in power industry is still in the early stage yet.Based on the above background,this paper studies data mining and its application in power industry.In view of the problem of power equipment fault detection,when the equipment failure,the operating is obviously different.So,the outlier mining of power equipment is put forward.Firstly,this paper introduces the definition and classification of outliers,and studies the related outlier algorithm,and compares the advantages and disadvantages of the algorithm,and then describes the architecture and core concepts of distributed computing framework Hadoop and Spark which used in this paper.Then,this paper deeply analyzes and studies clustering-based outlier detection algorithm,and optimizes respectively on two stages of clustering and outlier detection.On the clustering stage,the Canopy pre-clustering algorithm and K-Means clustering algorithm are combined to avoid the drawbacks that the input parameters of K-Means algorithm need to be specified manually and the initial clustering center is selected randomly,which improves the stability and efficiency of the algorithm,on the outlier detection stage,The K-nearest neighbor of the cluster center is introduced into the outlier calculation to optimize the FindCBLOF algorithm,which reduces the accidental error and improves the stability.Then,the distributed implementation of outlier detection algorithm is studied,the algorithm is implemented by Hadoop HDFS and Spark RDD programming interface,and the design idea and pseudo code of the algorithm are also given.Finally,the experiment is carried out in a distributed platform environment,it is carried out on the actual power equipment data,and the data sets of different sizes are experimented and compared.the experimental results show that the outlier detection algorithm can effectively detect power equipment faults,and also can effectively process large data sets through distributed implementation and distributed computing framework,which significantly reduces the processing latency.The research of this paper provides a fast,efficient,and highly scalable detection scheme for power equipment fault,it is of great practical use for having wide application prospect.

Keywords/Search Tags:

data mining, outlier algorithm, distributed computing, equipment fault detection, Spark framework

Related items

1	Research On Cluster Fault-Tolerant Techniques Of Distributed Stream Computing For Electrified Railway Monitoring Big Data
2	Research On Big Data Platform For Engineering Machinery Equipment Monitoring Based On Spark
3	Parallel Diagnosing Method For Monitoring Big Data Of Electric Power Equipment Based On Spark Framework
4	QAR Data Outlier Detection And Fault Location Algorithm Research
5	Research On Running Gear Vibration Data Processing Of The High-Speed Rail Based On Spark Parallel Computing Framework
6	Parallel Data Processing Technology For Fault Warning Of Wind Turbine Research
7	Research And Application Of Power Theft Detection Based On Data Mining
8	The Research Of Architecture Of Distributed Data Mining Based On Grid
9	Fault Diagnosis On Industrial Equipment Based On Data Correlations
10	Design And Implementation Of Massive AIS Message Data Mining System Based On Cloud Computing And Distributed Technology