Under the grand background of the rapid development of the domestic economy,it is of great significance to meet the electricity demand of social production and residents’ life.The loss of the power system includes two aspects: technical loss and non-technical loss.Technical loss is mainly caused by the internal resistance of the line,while non-technical loss is mainly caused by the behavior of electricity theft.In recent years,frequent electricity thefts have not only caused severe damage to the economic interests of power grid companies,but also caused potential safety hazards for normal electricity consumption by means of electricity theft such as privately pulling wires.Combating against illegal electricity theft is an inevitable requirement to regulate the order of normal electricity consumption.Traditional electricity theft detection is mainly based on on-site manual inspection.Such method is single and lack of pertinence,and has low efficiency limited by the shortage of inspectors.In order to overcome the shortage of traditional electricity theft detection methods and ensure normal electricity supplement,this thesis aims at low-voltage electricity stealing user-data and mainly improves from three aspects: feature engineering structure,detection model and data imbalance.The thesis proposes an electricity theft detection network based on WGAN-Stacking data driven,and the specific work is as follows:(1)Aiming at the problem that the user data collection of low-electricity theft have few features,this thesis makes full use of the experience of on-site experts,and preliminarily screens out the users suspected of stealing electricity by constructing the relevant indicators of the user’s electricity consumption data and the line loss rate of the station area.For the remaining users,the method of building feature engineering is adopted to mine the characteristics of user’s electricity consumption,including splitting the daily electricity consumption by year,quarter and working day to build statics including range and variance,aim to do bucket process according to the ring ration transformation of daily,weekly and monthly electricity consumption.The model is adopted Emebedding method to choose effective feature by itself,which provides richer data dimensions for the electricity theft detection model and reduces the risk of feature redundancy,so that the model can better learn the electricity consumption behavior of low-voltage customers and improve the effect of electricity theft detection.(2)Aiming at the problem of low accuracy of a single classification model,this thesis introduces the Stacking ensemble learning framework to further improve the accuracy of model detection.Firstly,XGBoost,Light GBM and Cat Boost are used for training,and the model parameters were adjusted under Bayesian optimization to improve the efficiency of parameter adjustment.Then models are added to the Stacking framework for integration.and logistic regression is determined as the final classifier.Secondly,in order to verify the effect of the Stacking ensemble framework,Boosting and Bagging ensemble learning framework are use in comparative experiments.Finally,combined with the actual on-site electricity stealing inspection process,the mean average precision(MAP)is constructed as the evaluation index,F1 score and AUC are used as auxiliary index to evaluate the effect of the model.The accuracy and effectiveness of the Stacking ensemble model are verified on the 2016 CCF competition data.(3)Aiming at the problem of unbalanced data samples of electricity theft customers,this thesis proposes a data augmentation method based on WGAN.On the one hand,this method expands the electricity theft sample data and balances the dataset,which solves the problem of unbalanced samples.On the other hand,the generated high-quality simulation samples and real samples are mixed into the Stacking electricity theft detection model for training,which can further improve the accuracy of the model.The data augmentation method based on Wasserstein GAN proposed in this thesis is compared with traditional data augmentation methods such as oversampling and undersampling,and the effectiveness of the proposed method is verified on the 2016 CCF competition data. |