Font Size: a A A

Research On Abnormal Identification Method Of Household Users' Electricity Consumption In Cloud Environment

Posted on:2019-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2428330545470249Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm is more frequently used in data mining.Both batch clustering algorithm and graph theory based clustering algorithm have been widely applied.Cluster analysis is mainly used to group data in similar cluster and to separate data into different clusters.This paper uses pre-processing and feature engineering operations of household electricity consumption data,and then adopts the SMK-means(Mini Batch K-means based on Simulated annealing)algorithm and SDM-clustering(Spectral clustering based on Distance function and Mini Batch K-means)algorithm to identify abnormal household electricity consumption data for home users.The SM-RF(Random Forest based on Similarity Matrix)algorithm performs abnormal classification analysis.This paper mainly focuses on the clustering algorithm based on Hadoop optimization and performance research and then employ the random forest algorithm to classify the identified anomalies.The research content is as follows:(1)The initial clustering center of Mini Batch K-means algorithm is generated randomly,which will cause the instability of the algorithm.A SMK-means algorithm based on simulated annealing algorithm is proposed.This algorithm is based on the MapReduce(distributed computing framework)to realize parallelization.Next,the paper run household electricity consumption data with SMK-means algorithm and validate performance of SMK-means by clustering accuracy,runtime,and accuracy of anomaly detection.The experimental results show that the SMK-means algorithm is superior to the other algorithm in stability and operating efficiency.(2)In view of the K-means algorithm used by spectral clustering algorithm in clustering,there are still some shortcomings of the standard clustering algorithm.Therefore,a SMD-clustering algorithm based on graph theory is proposed.By using the SMD-clustering algorithm to sample the data hierarchically.Firstly,the first k eigenvectors are selected by solving the eigenvalues and eigenvectors of the matrix.The realization of the first level of sampling.Secondly,the batch algorithm SMK-means algorithm is adopted to achieve the second level sampling and complete the clustering analysis of the data.Experiments show that the SMD-clustering algorithm has a better performance than SMK-means in terms of operational efficiency and accuracy of anomaly recognition.(3)According to the characteristics of the similarity matrix in the random forest algorithm,the SM-RF algorithm is proposed for the shortcomings of the similarity matrix.The concept of path distance is introduced.The sample data with higher similarity can be better classified into one class and the accuracy of classification has been improved.
Keywords/Search Tags:Clustering algorithm, Data Mining, Anomaly Recognition, Random Forest, Cloud Environment
PDF Full Text Request
Related items