| With the increasing complexity of modern industrial process,it is more and more important to improve the stability of production.Therefore,fault detection and diagnosis algorithm has become a research hotspot.However,there are many mass and imbalance data in production process.When the sample is unbalanced,traditional detection and diagnosis algorithms will tend to be biased towards the majority,resulting in underreporting of the minority,affecting the safe and stable operation of industrial production.Therefore,detection and diagnosis-related methods for unbalanced data have been particularly important.Unbalanced data detection related research mainly focuses on the data level and the algorithm level.Based on above two aspects,this paper studies the oversampling method and the integrated learning method for unbalanced data fault detection.At the data level,a weighted oversampling and sample screening method based on kernel probability density estimation is proposed.First,a balance threshold is set to make the new dataset meet the requirement of balance.Secondly,a weighted probability density distribution function of the minority samples is proposed based on the kernel probability density estimation method to strengthen the classification boundary.Furthermore,the acceptance-rejection sampling is introduced for sample screening to ensure the quality of the generated samples.Therefore,the sample generation is more theoretically feasible than classic oversampling methods.At the same time,the kern el distance,instead Euclidean distance,is introduced to calculate the k-nearest neighbor,which is more advantageous in dealing with high-dimensional nonlinear classification problems.At the algorithm level,considering the insignificant difference of weak classifiers in the Boosting method leading to poor generalization ability of strong classifier,support vector machines are introduced as weak classifiers to improve the difference of each weak classifier and realize a strong classifier with a high generalization ability.In addition,a G-mean based SVM-boosting method is proposed,which maximizes the G-mean value to assign weights to the weak classifiers.Then the classification accuracy can be improved both for the majority and minority data.Furthermore,the security threat caused by false negatives is reduced in the fault detection algorithm.Finally,the proposed method is applied on an agglomeration fault detection system for the fluid bed reactor.The proposed method is testified to be faster and more accurate than related classic algorithms.Furthermore,the UCI dataset is introduced to test the generality of the proposed method.The experiment results testified that the improved oversampling method and integrated learning algorithm can get a higher classification accuracy on both majority and minority samples than related classical methods. |