| The production of pulp and paper requires a high demand for water,which leads to a huge amount of wastewater discharge.Data shows that the total wastewater discharge of China’s pulp and paper industry ranks among the top three of all industries in the country.To achieve the standard discharge of pulp and paper wastewater,online monitoring of the effluent indicators is required.The chemical oxygen demand concentration,biochemical oxygen demand concentration,suspended solids concentration,and nitrate concentration in the effluent are key indicators reflecting the quality of wastewater treatment.Most existing online measurement instruments involve chemical reactions,which require a certain amount of time to complete the measurement and make real-time monitoring of effluent indicators difficult.Moreover,sensors used in online measurement devices are at risk of accuracy reduction due to corrosion in the poor working environment.Under such circumstances,it is of great significance to construct mechanism-driven or data-driven soft measurement models for the effluent indicators of pulp and paper wastewater.Due to the large amounts of biochemical reactions involved in the pulp and paper wastewater treatment process,its mechanism is difficult to present in the form of a model.Therefore,constructing data-driven soft measurement models is a better choice.This paper focuses on the data characteristics of the pulp and paper wastewater treatment process and conducts the following research on the construction of data-driven effluent indicator soft measurement models:1.We propose a random forest(RF)model based on dynamic slow feature analysis(DSFA).To address the time-varying and high-dimensional characteristics of pulp and paper wastewater data,slow feature analysis is used to extract potential long-term features in the data and reduce dimensionality to mitigate the negative effects of information redundancy.Considering the lag in the data measurement process,we introduce the augmented matrix technology to construct the dynamic slow feature analysis model.To address the nonlinear relationship between the variables in the pulp and paper wastewater data,the RF model is used as the prediction model for the final nonlinear prediction.The proposed DSFA-RF model is validated using actual data from a pulp and paper wastewater treatment plant in Dongguan,China.Compared with the traditional partial least squares(PLS)model,the R~2 of the DSFA-RF model is increased by 35%,and the RMSE is reduced by30.09%.2.To further enhance the model’s ability to extract deep features from data and strengthen its ability to handle high-dimensional data,we construct a long short-term memory autoencoder(LSTMAE)deep latent variable model based on the long short-term memory(LSTM)and autoencoder(AE)and use the XGBoost model for regression prediction.The LSTM model has strong ability to capture time-series features,which can be further enhanced by combining with the autoencoder.As a Boosting structure model,XGBoost can continuously optimize errors during prediction,resulting in accurate predictions.The LSTMAE-XGBoost model constructed in this paper fully exploits the LSTM’s ability to capture time-series features,the autoencoder’s powerful feature extraction ability,and the XGBoost’s excellent nonlinear regression prediction ability.The model validation section used data from the entire wastewater treatment process,with a total of 38variables..The LSTMAE-XGBoost model performs well on this dataset.3.A long short-term memory network model(S-TCAN-LSTM)based on SHAP analysis and attention mechanism temporal convolution is proposed.Firstly,SHAP analysis is used to perform importance analysis on the input variables and low-importance variables are removed based on correlation analysis to reduce the impact of information redundancy on model accuracy.Then,a temporal convolutional neural network with attention mechanism is used to extract potential variables from the input data.The convolutional layers used in the temporal convolutional neural network have local perception,which helps the model capture local patterns better and reduces the computational complexity and storage requirements of the model.In addition,its unique dilated causal convolution effectively increases the receptive field of the convolution kernel,making it better at handling long-term dependencies in the data.The addition of the attention mechanism can reduce the risk of gradient explosion or gradient disappearance in the temporal convolutional neural network,and can make the model pay more attention to more important variables,improving the quality of the extracted potential variables.The subsequent LSTM prediction model can handle complex nonlinear relationships and complement the temporal convolutional neural network to better capture long-and short-term dependencies in the data.The S-TCAN-LSTM model has excellent performance on both the full-stage wastewater data set and the BSM1 platform simulation data set. |