Research On Oversampling Ensemble Learning Algorithm For Fault Detection Of Unbalanced Data

Posted on:2022-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Wang

Full Text:PDF

GTID:2491306602455444

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

With the increasing complexity of modern industrial process,it is more and more important to improve the stability of production.Therefore,fault detection and diagnosis algorithm has become a research hotspot.However,there are many mass and imbalance data in production process.When the sample is unbalanced,traditional detection and diagnosis algorithms will tend to be biased towards the majority,resulting in underreporting of the minority,affecting the safe and stable operation of industrial production.Therefore,detection and diagnosis-related methods for unbalanced data have been particularly important.Unbalanced data detection related research mainly focuses on the data level and the algorithm level.Based on above two aspects,this paper studies the oversampling method and the integrated learning method for unbalanced data fault detection.At the data level,a weighted oversampling and sample screening method based on kernel probability density estimation is proposed.First,a balance threshold is set to make the new dataset meet the requirement of balance.Secondly,a weighted probability density distribution function of the minority samples is proposed based on the kernel probability density estimation method to strengthen the classification boundary.Furthermore,the acceptance-rejection sampling is introduced for sample screening to ensure the quality of the generated samples.Therefore,the sample generation is more theoretically feasible than classic oversampling methods.At the same time,the kern el distance,instead Euclidean distance,is introduced to calculate the k-nearest neighbor,which is more advantageous in dealing with high-dimensional nonlinear classification problems.At the algorithm level,considering the insignificant difference of weak classifiers in the Boosting method leading to poor generalization ability of strong classifier,support vector machines are introduced as weak classifiers to improve the difference of each weak classifier and realize a strong classifier with a high generalization ability.In addition,a G-mean based SVM-boosting method is proposed,which maximizes the G-mean value to assign weights to the weak classifiers.Then the classification accuracy can be improved both for the majority and minority data.Furthermore,the security threat caused by false negatives is reduced in the fault detection algorithm.Finally,the proposed method is applied on an agglomeration fault detection system for the fluid bed reactor.The proposed method is testified to be faster and more accurate than related classic algorithms.Furthermore,the UCI dataset is introduced to test the generality of the proposed method.The experiment results testified that the improved oversampling method and integrated learning algorithm can get a higher classification accuracy on both majority and minority samples than related classical methods.

Keywords/Search Tags:

unbalanced data failure detection, kernel probability density estimation, weighted oversampling, sample screening, integrated learning algorithm, G-mean value

PDF Full Text Request

Related items

1	Research And Modeling Application Of VSG Method Based On Monte Carlo And Kernel Density Estimation
2	Research On Air Quality Prediction Based On Ensemble Learning And Nonparametric Method
3	Application Of Weighted Machine Learning Method In Parameter Inversion And Subsidence Prediction In Huainan Mining Area
4	Research On Mixed Gas Detection Method Based On Unbalanced Sample
5	Study On Estimation Algorithm Of Optical Fiber Probe Void Fraction Based On Data Fusion And Implementation
6	Research And Application Of Imbalanced Data Classification Based On Oversampling And Ant Colony Optimization Resampling
7	The Study Of Land Use/Cover Change In The Nature Reserve Based On Kernel Density Estimation
8	Prediction Of Coal And Gas Outburst Under The Strategy Of Unbalanced Data
9	Research On Pedestrian Detection Technology Of Mine Locomotive
10	Sewage Treatment Fault Diagnosis And Software Development Based On Weighted Extreme Learning Machine Ensemble Algorithm