Research On Abnormal Identification Method Of Household Users' Electricity Consumption In Cloud Environment

Posted on:2019-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wang

Full Text:PDF

GTID:2428330545470249

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Clustering algorithm is more frequently used in data mining.Both batch clustering algorithm and graph theory based clustering algorithm have been widely applied.Cluster analysis is mainly used to group data in similar cluster and to separate data into different clusters.This paper uses pre-processing and feature engineering operations of household electricity consumption data,and then adopts the SMK-means(Mini Batch K-means based on Simulated annealing)algorithm and SDM-clustering(Spectral clustering based on Distance function and Mini Batch K-means)algorithm to identify abnormal household electricity consumption data for home users.The SM-RF(Random Forest based on Similarity Matrix)algorithm performs abnormal classification analysis.This paper mainly focuses on the clustering algorithm based on Hadoop optimization and performance research and then employ the random forest algorithm to classify the identified anomalies.The research content is as follows:(1)The initial clustering center of Mini Batch K-means algorithm is generated randomly,which will cause the instability of the algorithm.A SMK-means algorithm based on simulated annealing algorithm is proposed.This algorithm is based on the MapReduce(distributed computing framework)to realize parallelization.Next,the paper run household electricity consumption data with SMK-means algorithm and validate performance of SMK-means by clustering accuracy,runtime,and accuracy of anomaly detection.The experimental results show that the SMK-means algorithm is superior to the other algorithm in stability and operating efficiency.(2)In view of the K-means algorithm used by spectral clustering algorithm in clustering,there are still some shortcomings of the standard clustering algorithm.Therefore,a SMD-clustering algorithm based on graph theory is proposed.By using the SMD-clustering algorithm to sample the data hierarchically.Firstly,the first k eigenvectors are selected by solving the eigenvalues and eigenvectors of the matrix.The realization of the first level of sampling.Secondly,the batch algorithm SMK-means algorithm is adopted to achieve the second level sampling and complete the clustering analysis of the data.Experiments show that the SMD-clustering algorithm has a better performance than SMK-means in terms of operational efficiency and accuracy of anomaly recognition.(3)According to the characteristics of the similarity matrix in the random forest algorithm,the SM-RF algorithm is proposed for the shortcomings of the similarity matrix.The concept of path distance is introduced.The sample data with higher similarity can be better classified into one class and the accuracy of classification has been improved.

Keywords/Search Tags:

Clustering algorithm, Data Mining, Anomaly Recognition, Random Forest, Cloud Environment

PDF Full Text Request

Related items

1	Application Of Random Forest In Cloud Computing Anomaly Detection
2	Telecom Customer Churn Prediction And Analysis Based On Improved Random Forest Algorithm
3	Design And Implementation Of Anomaly Detection System For Mesos Cloud Platform
4	Isolated Forest Algorithm Based On Qualitative Data Clustering
5	Parallel Research And Application Of Machine Learning Algorithm Based On Cloud Platform
6	Research On Anomaly Detection Algorithm Of Time Series Data In Cloud Environment
7	The Study Of Forest Ecological Station Data Clustering Based On Big Data
8	Research Of Clustering Algorithm Based On Random Fuzziness
9	Large-scale Network Anomaly Detection Based On Data Mining
10	Research On KPI Anomaly Detection For Intelligent Operation And Maintenance Under Cloud Environment