| The power industry is the cornerstone of my country’s national economic development,of which thermal power accounts for the vast majority of the total power generation.However,under the environment where the country fully advocates energy conservation and emission reduction,pollutants mainly NOx will be generated in the process of thermal power generation,that is the most polluting to the environment and the most difficult to deal with.Therefore,establishing a NOx emission model for thermal power plants can urge enterprises to strictly implement environmental protection policies and achieve economical,efficient and sustainable development.In the existing NOx emission modeling cases,a series of NOx emission modeling methods based on support vector machines have achieved excellent modeling results.Support vector machine is a machine learning algorithm developed on the basis of statistical learning theory.It has the advantages of complete theory and strong adaptability.Therefore,in today’s rapid development of deep learning and artificial intelligence still maintain a high level of activity.Considering that the data collected from the DCS system of the thermal power plant is too large,and the support vector machine algorithm of the serial mechanism is still not competent for the modeling task of large-scale data,the paper conducts intensive research on the parallelization algorithm of the support vector machine.Among them,the parallelization technology based on the big data platform is developing rapidly.Hadoop and Apache Spark are the representative big data processing platforms.However,due to the high computing delay of Hadoop,it is unable to perform real-time,fast and iterative computing tasks.Therefore,Apache Spark,which is efficient,easy to use,and versatile,is used to parallelize the algorithm.In summary,this paper improves the traditional support vector regression for the large-scale data NOx emission modeling problem in thermal power plants,and implements it in parallel based on Apache Spark,which further improves the ability of the algorithm to process big data.The main work and innovations in this article are as follows:(1)A semiparametric support vector regression is proposed by constructing the adjustable predefined value vector of the weight vector of the support vector regression machine,and using the fuzzy C-means algorithm to determine the basis vector in the predefined value vector.And the solution strategy of semiparametric support vector regression based on iterative reweighted least squares is given.The effectiveness of the semiparametric support vector regression algorithm is proved by numerical example experiments.This algorithm can reduce computing complexity and control the complexity in the case of guarantee accuracy.(2)In order to further improve the ability of semiparametric support vector regression algorithm to process large-scale data,based on the idea of Apache Spark and data set division,a parallel implementation of semiparametric support vector regression based on Apache Spark is given.(3)Taking the 600 MW boiler of a power plant as the research object,using the algorithm proposed in this paper,the NOx emission model was successfully established.In this paper,the variable importance in projection is used to select variables and determine the input variables.Then,the Pearson algorithm is used to analyze the time delay of the determined variables.Finally,the proposed algorithm is used to establish the NOx emission model of thermal power plant.The experimental results show that the parallel mechanism semiparametric support vector regression has high modeling efficiency,in particular,when the amount of data is increased,the model established in this article has more significant advantages. |