Font Size: a A A

Research And Application Of Imbalanced Data Classification Algorithm

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2518306536463634Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Data imbalance problems exist in many fields.Most machine learning algorithms are designed on balanced data,which is easy to misclassify the minority class into the majority class when dealing with imbalance problems.The problem data imbalance causes a gap between the practical and theory use of machine learning,which needs to be solved in the practical application of machine learning.This paper conducts theoretical and applied research on this problem,and applies the proposed method to the task of power outage prediction in the distribution networks(DN).The DN plays an important role in daily life and social development.If the data collected by the DN information system can be effectively used to establish a power outage warning model,it can make operation and maintenance in time,which is of great significance for improving the reliability of the power supply of the DN and ensuring the power quality for users.At the data preprocessing,most of the literature does not involve the research of the imbalance within the class,and regard it as the imbalance between the classes,ignoring the impact of imbalance within the class on the model performance.Therefore,this paper proposes a cluster-based counter-neural network(K-Means-GAN)data generation method,which can balance the distribution of minority samples and reduce the imbalance of the whole data set to a certain degree.At the algorithms,scholars often combine cost-sensitive with other algorithms for imbalanced data classification.It changes the optimization goal of the model and takes the total cost of classification error as the global optimization goal,which makes the model over fit the minority class.Therefore,this paper proposed a random cost-sensitive convolutional neural network(Random Cost-CNN)classification algorithm that combines cost-sensitive and random theory,thus alleviating the problem of minority class overfitting and improving the generalization of models.Finally,based on the actual production data,this paper established a 10 k V early warning model for power failures and outages,which can make workers do maintenance and repair in time.It is of great significance for promoting the stable development of the national economy,ensuring the quality of user's electricity consumption and improving the satisfaction of the whole society.In addition,this paper starts from a data-driven perspective to carry out preliminary data mining research on the power grid,laying a solid foundation for the efficient and reasonable use of grid data in the future.
Keywords/Search Tags:Imbalanced data classification, In-class imbalance, K-Means-GAN, RandomCost-CNN, Power outage prediction for 10kV line
PDF Full Text Request
Related items