| Among groundwater organic contaminants,heavy non-aqueous phase fluids have high density,low water solubility and high interfacial tension,which can penetrate the aquifer and remain at the bottom of the aquifer.Coupled with the strong concealment of groundwater itself,the contamination of DNAPLs is not easy to be detected and difficult to remediate.Therefore,it is of great theoretical significance and guidance to develop a scientific and efficient remediation plan for groundwater DANPLs contamination sources.The study of groundwater contamination traceability inversion identification refers to the inversion solution of the mathematical simulation model describing the migration and transformation law of groundwater contamination based on the existing field actual monitoring data(groundwater level and contaminant concentration monitoring data),combined with data collection and analysis,field investigation and expert experience and other supporting information,and then identify and determine the spatial location,number and history of the release of the groundwater contamination sources.and determine the spatial location,number and release history of groundwater pollution sources.Since the migration and transformation of pollutants in groundwater systems is an irreversible process,the groundwater pollution traceability inversion identification problem is a nonlinear inversion problem of mathematical equations.The inverse problem is difficult to solve because of its uncomfortable and nonlinear characteristics.With the rapid development of computer technology,the inverse problem has become a very important and active research direction,with important practical significance and broad application prospects.In this paper,we adopt a combination of hypothetical and practical examples,and apply various theories and methods such as mathematical-physical positive and negative inversion,simulation-optimization,deep learning,and optimization algorithms to investigate the problem of groundwater DNAPLs pollution traceability identification.In this paper,the developed theories and methods are first applied to hypothetical examples and their validity and applicability are verified.Then it is applied to a petrochemical contaminated site in Jilin City to finally identify and determine the groundwater DNAPLs contamination source information of the actual contaminated site and the values of some parameters taken in the simulation model.Firstly,based on the geological and hydrogeological conditions of the actual contaminated site,a conceptual hydrogeological model of the study area and a numerical simulation model of groundwater DNAPLs contamination multiphase flow are established.In applying the simulation-optimization method to solve the optimization model for the inverse identification of groundwater DNAPLs pollution,the newly generated conjecture values of the optimization model need to be continuously substituted into the simulation model for iterative calculation,which will lead to a significant increase in the computational time cost.Most of the surrogate models are black-box models with fast operation,and their outputs can approximate the simulation model with high accuracy for the same inputs.In this paper,the computational load caused by calling the simulation model is significantly reduced by establishing an surrogate model for the numerical simulation of groundwater DNAPLs multiphase flow.In order to reduce the modeling dimension of the surrogate model,the sensitivity of the parameters in the multiphase flow numerical simulation model is calculated by applying the local sensitivity analysis method.Based on the results of the sensitivity analysis,the parameters that have a greater influence on the output of the simulation model are selected and used as input variables of the surrogate model together with the pollution source information,which are the variables to be sought for the final groundwater DNAPLs pollution traceability inversion identification study.The variables that have less influence on the output results of the simulation model and can be approximated by auxiliary information(field survey,data collection,professional experience,etc.)are used as background variables,which are input into the multiphase flow numerical simulation model with known values.Then the Latin hypercube sampling method is applied to sample the feasible domain of the variables to be found,and the sampling results are substituted into the multiphase flow numerical simulation model,and the simulation model is run to perform the forward calculation,and finally a series of input-output sample data sets are obtained,which are then divided into training sample data sets and test sample data sets for modeling and testing of the surrogate model.In this paper,two shallow learning methods(BP neural network method and BP neural network method improved based on sparrow search algorithm),two deep neural network methods(long short-term memory neural network method and two-way long short-term memory neural network method),and a deep learning method different from deep neural network(deep forest method)are applied to build the surrogate model of the simulation model,respectively.The accuracy of the five surrogate models is then examined and compared by RMSE,MRE,and R2,so that the feasibility and advantages of the deep forest surrogate model can be analyzed.Then,the study establishes the optimization model for the inverse identification of groundwater DNAPLs pollution traceability.To improve the solution accuracy of the optimization model,the Sobol sequence and the vertical and horizontal crossover strategy are introduced to improve the traditional sparrow search algorithm,and a sparrow search algorithm based on the Sobol sequence and the vertical and horizontal crossover strategy is constructed,which is called the improved sparrow search algorithm.Through hypothetical examples,the effectiveness and applicability of the improved sparrow search algorithm are analyzed.Finally,the simulation-optimization method based on the imporved sparrow search algorithm and the depth forest surrogate model proposed in this paper is implemented in a Contamination Site in Jilin City to finally identify and determine the contamination source information of groundwater DNAPLs and some parameters of the multiphase flow numerical simulation model of the actual Contamination Site.With the help of the above study,the following conclusions are drawn:(1)In this study,the depth forest approach is applied to establish an surrogate model for the simulation model.Compared with the BPNN surrogate model and SSABPNN surrogate model,which are two surrogate models based on shallow learning methods,the deep forest surrogate model has higher approximation accuracy.Compared with the LSTM substitution model and Bi LSTM substitution model,which are based on deep neural network methods,the deep forest substitution model not only has higher accuracy,but also has the advantages of easy training,high efficiency,simple structure,and less adjustment hyperparameters required.At the three monitoring wells for the specific problem studied in this paper,the deep forest substitution model has the highest R2 mean value of 0.9915,the lowest RMSE mean value of 71.65 μg/L,and the lowest MRE mean value of 6.86%.(2)In this study,an improved sparrow search algorithm(sparrow search algorithm based on Sobol sequence and longitudinal crossover strategy)was applied to solve the nonlinear programming optimization model for the inverse identification of groundwater DNAPLs pollution traceability.Compared with the traditional sparrow search algorithm,the improved sparrow search algorithm introduces the Sobol sequence to initialize the sparrow population,which makes the sparrow individuals more uniformly distributed and improves the convergence speed and search efficiency of the algorithm;the introduction of the vertical and horizontal crossover strategy improves the global search capability and solution accuracy of the algorithm.When the improved sparrow search algorithm was applied to solve the optimization model,the average relative error was only 7.24%,which was 2.32% lower than that of the traditional sparrow search algorithm.Therefore,it is effective to apply the improved sparrow search algorithm to solve the optimization model of groundwater DNAPLs pollution traceability inversion identification of actual contaminated sites. |