| During the continuous maturity and development of the industry process,the process technology tends to become larger and more complex,and how to accurately and timely detect and solve faults becomes the key of ensuring process safety and preventing catastrophic accidents.In recent years,with the development of data acquisition and data storage technology,the recorded process data has explosively increased.Thus,the data-driven fault diagnosis method has been widely used.However,in the actual industrial system,data extracted from complex industrial processes are often accompanied by the imbalance characteristics,as shown as follows: Firstly,the imbalance characteristic in sample value is manifested by that the original data usually has multiple characteristic variables,but the effective information carried by these characteristic variables is different,resulting in different contributions to the diagnostic model.Secondly,the imbalance characteristic in sample label is manifested by that there is a large number of samples but only a few of them are labeled,and lots of unsupervised information carried by the cheapest and easily available unlabeled samples are lost,resulting in a waste of sample structure information.Thirdly,the imbalance characteristic in sample classes is manifested by that the system is usually under normal operating conditions,and most of the obtained data are normal samples,with only a small number of fault samples,resulting in an imbalanced amount of information provided by the two types of samples for model construction.(1)Aiming at the problem of fault diagnosis under the environment with imbalanced sample values,a fault diagnosis method based on global local information — Local Reconstructed Kernel Principal Component Analysis(LRKPCA)—is proposed.Firstly,the global low dimensional features of the original data are extracted based on Kernel Principal Component Analysis(KPCA),and at the same time,the local low dimensional features of the data are obtained using the t-distributed Random Neighbor Embedding(t-SNE)algorithm.Then,calculate the neighborhood error of local low dimensional features of the data,and inverse map it to the global feature space.Through coordinate reconstruction,the global-local low dimensional reconstruction features of the data are obtained,that is,a small number of features which are more valuable to the fault diagnosis model are obtained.Finally,based on the Ada Boost algorithm,the reconstructed global-local low dimensional feature is classified to achieve fault diagnosis.Facing the characteristics of multiple data feature variables,strong nonlinearity,and different contributions of feature variables,the fault diagnosis method based on LRKPCA considers the global and local information of samples.It realizes the mapping of data from high-dimensional space to low-dimensional space,extracts the feature that are more valuable to the fault diagnosis model,and overcomes the shortcomings of poor accuracy and local structure information loss of traditional KPCA.It solves the problem of uneven learning of sample feature information and has better diagnostic ability.(2)Aiming at the problem of fault diagnosis under the environment with imbalanced sample labels,a fault diagnosis method based on active learning-semi supervised learning — Density Ratio Batch Active Learning Adaptive Laplacian Graph Trimming(DRBAL-ALGT)— is proposed.Firstly,an index Density Ratio(DR)is constructed to filter samples distributed in sparse regions with higher uncertainty.Secondly,during each iteration,multiple samples are selected in batches based on the index DR for manual label,and the data set is updated.Thirdly,an Adaptive Laplacian Graph Trimming(ALGT)semi-supervised classifier is constructed.Finally,fault diagnosis is implemented based on the classification by DRBAL-ALGT.The fault diagnosis method based on DRBAL-ALGT considers samples in sparse regions with higher uncertainty to be more valuable for model building,while saving the cost of manual labeling and learning unlabeled training information more effectively.In addition,a large number of cheap unlabeled samples are fully used to assist a small number of labeled samples in training classification models.It solves the problem of insufficient utilization of sample information by the model and improves the diagnostic ability.(3)Aiming at the problem of fault diagnosis under the environment with imbalanced sample classes,a cost-sensitive classification-based fault diagnosis method — Bias Weights Ada Boost(BW-Ada Boost)based fault diagnosis method—is proposed.Firstly,the majority class samples are under sampled based on the k-Nearest Neighbor(KNN)algorithm,and multiple bias datasets with different balance degrees are constructed around the neighborhood region of each minority class sample.Then,a BW-Ada Boost model is constructed based on bias data sets,in which higher cost is given to the minority class based on the idea of cost-sensitive learning,so that the original minority samples can receive more attention.Finally,the cost-sensitive classifier BW-Ada Boost is used to classify the test samples for fault diagnosis Multiple fault diagnosis is achieved through multiple classification methods.The fault diagnosis method based on BW-Ada Boost considers that the neighbor region of two classes is more valuable.The bias data set is obtained,and balanced learning for both majority and minority samples is realized based on cost-sensitive learning.It solves the problem of the diagnostic model skewing to the majority class and improves the diagnostic ability. |