Font Size: a A A

Fault Classification Based On Modified Active Learning And Semi-Supervised Learning

Posted on:2018-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhuFull Text:PDF
GTID:2348330515984727Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As a significant part of process system engineering,process monitoring technology plays a fundamental role,both theoretically and practically,in crucial areas such as maintaining the stability and reliability of complex industrial process,improving the quality of products and some other key issues.Thanks to the rapid progress and wide application of distributed control system(DCS),massive industrial data could be well reserved today,hence data-driven fault diagnosis and process monitoring methods have drawn widespread attention in industrial and academic fields.However,when traditional pattern recognition method is applied to fault diagnosis in process industry,the characteristics of the collected data are easily ignored,including data scarcity due to high cost of obtaining labeled fault samples,and data imbalance between normal data and fault data,and within different fault types.Therefore,training through such dataset often impacts the precision of classification model.To address this problem,a new fault classification method for imbalanced small dataset is proposed in this paper based on modified active learning and semi-supervised learning,joined with resampling and cost sensitive learning.The main research and results of this paper are listed as follows:Firstly,a new method based on modified active learning and weighted support vector machine for fault classification in real-world industrial process is presented,aiming at solving the problems that large-scale labeled fault samples are not easy to acquire,labeling cost is expensive,datasets are usually imbalanced and contaminated with outliers.By comprehensively measuring the informativeness and representativeness of unlabeled instances and reducing the impact of outliers,an improved Best versus Second-Best selection method is proposed to iteratively select the most valuable data and query their labels.Meanwhile,Weighted-SVM is introduced to tackle the impact of imbalanced class distribution on active learning and classification accuracy,using different weight factors for classes and individual samples,and a new efficient method of determining the penalty coefficient is also presented.Case study on TE process verifies the proposed approach could achieve superior classification accuracy while reducing the labeling cost.Secondly,on the basis of active learning,due to the expensive cost of labeling by experts,the ideology of semi-supervised learning is introduced,so as to improve the classification accuracy while reducing manual efforts.To inhibit the performance degradation brought by the unlabeled data in certain cases of semi-supervised learning,a new labeling algorithm based on modified Bayesian decision fusion ensembling multiple classifiers is proposed to improve the stability of the labeling process.Meanwhile the sufficient condition to update the learning model under PAC theory is analyzed in detail as well.A noisy data depuration method based on nearest neighbor rule is adopted to re-ensure the purity of new training data.Besides,the SMOTE resampling method is used to deal with the class-imbalanced dataset in each iteration of the training process.The experiment results verify the effectiveness of the proposed semi-supervised learning algorithm with a superior classification accuracy and stability.Finally,considering the nature and the structural similarities and complementary characteristics of active learning and semi-supervised learning,this paper then studies how to combine these two algorithms to further improve the classification precision.After utilizing the most valuable unlabeled samples,traditional active learning method no longer makes use of the remaining unlabeled samples which still carry rich information.Given that the initial training sample size is very small,it is less likely to acquire the correct label of the unlabeled samples with most uncertainty through semi-supervised learning,thus causes accumulated errors in the iterative process.So an integrated method combining active learning and semi-supervised learning for fault classification is proposed in this paper to improve the performance of the diagnosis model to a greater degree.Abovementioned experiment results have demonstrated the effectiveness and superiority of the proposed algorithm.
Keywords/Search Tags:Fault Diagnosis, Active Learning, Semi-supervised Learning, Fault Classification for Small Imbalanced Dataset, Decision Fusion
PDF Full Text Request
Related items