Font Size: a A A

Research And Application Of Data Mining Outlier Detection Algorithm In Power Equipment Fault Detection

Posted on:2018-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y W FengFull Text:PDF
GTID:2382330518995558Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
We live in an era of explosive data growth,people produce huge amount of data in daily social life,while the social economy and Internet obtain rapid development.Therefore,the purpose of data mining is to extract valuable information from these massive data,and in recent years,the rise of cloud computing provides a new field of development to data mining theory.However,the electric power industry plays a very important role in the national economy as basic industry to support the national economy,so the demand of data mining for electric power data is extremely urgent.But the research on data mining and cloud computing application in power industry is still in the early stage yet.Based on the above background,this paper studies data mining and its application in power industry.In view of the problem of power equipment fault detection,when the equipment failure,the operating is obviously different.So,the outlier mining of power equipment is put forward.Firstly,this paper introduces the definition and classification of outliers,and studies the related outlier algorithm,and compares the advantages and disadvantages of the algorithm,and then describes the architecture and core concepts of distributed computing framework Hadoop and Spark which used in this paper.Then,this paper deeply analyzes and studies clustering-based outlier detection algorithm,and optimizes respectively on two stages of clustering and outlier detection.On the clustering stage,the Canopy pre-clustering algorithm and K-Means clustering algorithm are combined to avoid the drawbacks that the input parameters of K-Means algorithm need to be specified manually and the initial clustering center is selected randomly,which improves the stability and efficiency of the algorithm,on the outlier detection stage,The K-nearest neighbor of the cluster center is introduced into the outlier calculation to optimize the FindCBLOF algorithm,which reduces the accidental error and improves the stability.Then,the distributed implementation of outlier detection algorithm is studied,the algorithm is implemented by Hadoop HDFS and Spark RDD programming interface,and the design idea and pseudo code of the algorithm are also given.Finally,the experiment is carried out in a distributed platform environment,it is carried out on the actual power equipment data,and the data sets of different sizes are experimented and compared.the experimental results show that the outlier detection algorithm can effectively detect power equipment faults,and also can effectively process large data sets through distributed implementation and distributed computing framework,which significantly reduces the processing latency.The research of this paper provides a fast,efficient,and highly scalable detection scheme for power equipment fault,it is of great practical use for having wide application prospect.
Keywords/Search Tags:data mining, outlier algorithm, distributed computing, equipment fault detection, Spark framework
PDF Full Text Request
Related items