| Thermal power generation is an important power generation method in China and has an important position in Chinese energy structure.In recent years,as the country’s requirements for energy saving and emission reduction are becoming more and more strict,it is necessary to accurately model and optimize the operating state of coal-fired power plants.However,despite a large amount of mechanistic research work,coal-fired power plants are extremely complex due to their complexity,large scale,diversity of operating conditions and variability in coal quality.In the historical operation,coal-fired power generation has accumulated a large number of operational data covering the entire process of power plant production.These data provide data support for the application and development of big data technology in coal-fired power plants.Therefore,data-driven modeling techniques have been used more and more in parameter prediction,fault diagnosis,safety monitoring,and performance optimization.However,the current data-driven thermal power modeling work is mainly focused on modeling under the guidance of the mechanism model by using a small number of monitoring parameters of the system.The special requirements of high-dimensional,time-varying time-series data modeling is rarely considered.This makes the model accuracy difficult to maintain a high level under complicated conditions.At the same time,the lack of research on time delay makes the model poorly transparent,and the model interpretability problem will also be encountered in the actual system deployment.Aiming at the high-dimensional time series industrial big data of thermal power plants,this paper constructs a data-driven modeling method for high-dimensional time series big data of thermal power units,in order to improve the modeling efficiency and modeling accuracy of thermal power system.This paper selects the most interested and representative parameters of the thermal power plant as an example.The prediction models of main steam temperature,reheated steam temperature,output power are established respectively based on the data-driven reverse modeling method proposed in this paper.The main research contents are as follows:1)In view of the high dimensionality of thermal dataset industrial big data,this paper designs and implements a feature selection method for fast evaluation features combined with filter and embedded method.It is trained by statistical correlation analysis and GBM model.The weights are extracted,and the important features related to predicted targets are selected,which avoids the purely mechanism-based model guidance.Only the causal relationship between data is considered,and the selection of modeling parameters is less,while the existence of high-value correlation between data is neglected.So that it is more difficult to find the secondary influence factors of the system.2)Based on the time series of industrial thermal power data and the complex time-delay relationship between various related features and forecasting targets in industrial big data,this paper proposes a kind of first-order time series correlation based on data mining technology.The algorithm TD-CORT calculates the delay between features,and the algorithm can accurately determine the order of delay.The calculation of time delay between the features can be regarded as a kind of feature selection method in the time dimension.Accurate time delay feature engineering simultaneously improves the accuracy while maintaining low complexity of the model.3)According to the complexity of working conditions in thermal power plants,feature preprocessing,feature selection and ensembling model greatly reduce human intervention while ensuring the accuracy of the model.The ensembling model scheme based on LightGBM and deep neural network is proposed innovatively.It achieves high-precision prediction accuracy of target features under complex conditions while ensuring the generalization of the model.This work has been implemented on the TensorFlow Machine Learning System Suite,which is deployed on the Docker container-based Rancher platform.The implemented system prediction model has been deployed on a 1000MW ultra-supercritical thermal power plant in China to model and predict multiple key parameters of the thermal power system.The site test and the ten-month continuous service show that the algorithm can maintain high prediction accuracy under various working conditions of the thermal power unit,which has provided effective guidance for the actual operation and scheduling of the power plant.According to the accuracy of the model built by the data-driven modeling scheme,the feasibility of the modeling method is verified,which provides a new way for thermal power unit modeling. |