The non-technical loss of power system refers to the part of electric energy consumed by users but not received by the power company.Except for a small part caused by the failure of the meters itself,it is mostly caused by the stealing behavior of power users.The existence of non-technical losses has an adverse impact on the economy and reliability of the power system,so the abnormal situation screening of power users is very necessary.With the development of power big data technology,power companies can not only obtain the final power consumption cost of users,but also collect a large number of fine-grained process data of users’ power consumption.These data provide a data source for data-driven abnormal power consumption detection algorithm.Compared with the traditional door-to-door inspection method,the method of locking abnormal users by anomaly detection algorithm is more efficient and has practical research value.Firstly,this paper introduces the research background of power user anomaly detection.All data-driven detection methods are classified into supervised,unsupervised and semi supervised,and the advantages and disadvantages of each method are introduced.It is determined that the anomaly detection model in this paper is based on unsupervised clustering algorithm;This paper expounds the work contents of data preprocessing,feature engineering,clustering algorithm and model evaluation of a power load anomaly detection model based on clustering.Secondly,a PCA-DPeaks anomaly detection model based on principal component analysis(PCA)and density peaks(DPeaks)clustering is proposed.The model follows the process of "data preprocessing-feature construction-dimension reduction-clustering-anomaly detection".Firstly,the original data set is preprocessed to obtain the daily and monthly power consumption data set of all power users;Then,combined with the physical meaning of power load curve,15 feature quantities are set and selected to form the feature set,and then the dimension of the feature set is reduced by PCA method to obtain a new feature set,so as to condense the high-dimensional daily and monthly power consumption data set with low information density into a low-dimensional new feature set with high information density,so as to improve the calculation efficiency of the model and realize the visualization of detection results;Finally,input the new feature set into the anomaly detection model,output the users predicted as abnormal by the model,and use the model evaluation index to evaluate the detection effect of the model.Based on the example of Irish smart meter data,it is concluded that PCA-Dpeaks model is effective but the detection effect is poor.Finally,aiming at the poor detection effect of PCA-Dpeaks model and the general difficulty in selecting the parameters of this kind of unsupervised model,a LDA-Dpeaks double criterion model considering user location information is proposed.Firstly,the model is still based on Dpeaks clustering algorithm,so it can classify all power users first and then detect them category by category,which is suitable for scenes with more user types and large amount of data;Secondly,in the process of dimensionality reduction,the model uses linear discriminant analysis(LDA)to condense the user’s station area code,cell number and other location information into a new feature set,which improves the detection rate and accuracy of the model;Finally,the abnormal value judgment standard of the model sets double criteria,which increases the severity of abnormal value judgment to increase the reasonable value range of abnormal threshold parameters of a single criterion,so as to reduce the difficulty of selecting model parameters and reduce the sensitivity of the model to parameter perturbation.Based on the example verification of Irish smart meter data,it is concluded that the introduction of user location information can improve the accuracy of the model;The setting of double criteria can reduce the difficulty of selecting model parameters. |