Font Size: a A A

Research On Data-driven Modeling And Data Augmentation Method For Industrial Processes Based On Machine Lerning

Posted on:2023-09-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ZhangFull Text:PDF
GTID:1528306794988589Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In modern process industries,the establishment of accurate and reliable industrial process data-driven models plays a vital role in the early planning of decision-makers.Model performance and data quality are two key elements of industrial data modeling.With the increasing complexity of industrial process data,how to improve the accuracy and robustness of the industrial process datadriven models is an important research direction in the field of intelligent computing.In addition,because the probability of some events in the system is low or the acquisition cost of some data is high due to physical constraints,there is a lack of samples in the modeling of complex petrochemical process.Effective samples have the characteristics of small quantity,uneven distribution,strong nonlinearity,noise,missing and uncertainty,which seriously restrict the performance of data-driven models and is difficult to realize the optimal operation of process system.Improving the performance of data-driven model and generating effective virtual samples to expand the training sample set have important theoretical research significance and industrial application value.On the basis of summarizing a lot of related research work,the structure and learning algorithm of neural networks are further studied in this paper,so as to propose a new method of data-driven modeling for industrial process based on machine learning.In order to solve the problem of sample shortage in supervised learning algorithm,we carry out the relevant research work of virtual sample generation methods,and effectively expand the training sample set to improve the performance of industrial process models.The main contents of this research are listed as follows:1.Ensemble neural network model of multiple activation function extreme learning machine based on partial least squares regression(PLSR-MAFELM).In order to effectively deal with complex industrial process data with high nonlinearity,the extreme learning machine models with multi activation functions are trained to recognize or approximate the input samples.Using the strategy of ensemble learning,the output of a single extreme learning machine model is aggregated by partial least squares regression to obtain the final output of the ensemble network model.The proposed model has strong nonlinear processing ability and can effectively deal with the noise information of the data.PLSR-MAFELM can solve the problems of low accuracy and poor robustness in the traditional extreme learning machine model when the data complexity increases.The proposed model has the advantages of high generalization accuracy,fast learning speed and strong robustness.2.A new method of virtual sample generation based on manifold learning.In order to effectively solve the problems of uneven distribution and lack of training samples in industrial process data-driven modeling,a new virtual sample generation method based on the concept of topological manifold is proposed to realize data augmentation.As a data preprocessing step in process modeling,data augmentation will directly affect the accuracy and generalization ability of the model.The manifold learning method of Isometric Mapping or Local Linear Embedding is used in the proposed virtual sample generation method to restore the low-dimensional manifold structure from the highdimensional sampling data,which can obtain the corresponding embedding mapping relationship and realize the dimension reduction of the data.The visualization structure of low dimensional manifold can be used to explore the real sparse region of high-dimensional data of industrial process.Then the interpolation method is used to generate effective virtual samples in the sparse area of low dimensional data space to augment the original sample information.After obtaining the virtual sample,the virtual sample screening step is performed.Construct the triangular membership function according to the data attribute characteristics,find the asymmetric extensible region of the samples,and eliminate the virtual samples outside the extensible region to ensure the rationality of the virtual samples.3.A new method of virtual sample generation based on Quantile Regression and Variational Generative Adversarial Network(QRVAE-GAN).In order to make the generation model have the ability to learn complex probability distribution and generate labeled virtual samples,a deep generation framework QRVAE-GAN is proposed in this paper.QRVAE-GAN has the ability to generate labeled samples and can be used to deal with sample augmentation in regression prediction problems.The proposed deep learning model QRVAEGAN includes a Variational Auto-Encoder combined with Generative Adversarial Network.Among them,the Encoder is responsible for mapping the real sample to a potential vector;The Generator is used to reconstruct the original sample,and the characteristics of the original sample are matched with the given potential vector;The Discriminator is responsible for judging whether the input sample belongs to the real sample distribution.The mapping function of the Encoder reduces the training difficulty of the Generator and improves the training speed of the model.And the proposed virtual sample generation model QRVAE-GAN embeds the Quantile Regression output y of the sample as an additional condition into the generative adversarial structure,which affects the generation of the input variable x,so that the model has better generation and prediction ability.QRVAE-GAN can improve the quality of virtual sample generation and increase sample diversity.4.Augment the generated virtual samples to the original data set,and use the expanded training data set to train the proposed neural network model,so as to improve the accuracy and robustness of intelligent data-driven modeling.In this paper,the multivariable benchmark function with complex nonlinear relationship is used to verify the effectiveness of the proposed methods.At the same time,the proposed method is applied to the data-driven modeling of two practical industrial processes: high-density polyethylene and purified terephthalic acid.The verification results of multiple benchmark function data sets and two actual industrial process data sets show that the neural network model proposed in this paper has faster learning speed and better generalization ability,and the proposed data augmentation method can further improve the performance of data-driven models.
Keywords/Search Tags:Data-Driven Modeling, Neural Network, Data Augmentation, Virtual Sample Generation, Manifold Learning, Deep Generation Model
PDF Full Text Request
Related items