| With the continuous development of science and technology,the scale of factory enterprises is expanding,and a series of industrial production leads to the accumulation of pollutants in the site,which aggravates the environmental damage.And in the process of investigation,assessment and remediation of contaminated sites,there are many challenges,such as: the accumulation of a large amount of site environmental information data of many types,powerful scale and low value density during the investigation,which urgently needs to be used effectively;as the most common method that can accurately and effectively investigate the contamination status of the site,borehole sampling analysis has the disadvantages of high cost and long lead time,etc.In view of the above deficiencies,this paper firstly proposes a method for making site contamination prediction machine learning model samples based on the environmental big data of typical contaminated sites in China,and proposes the feature reconstruction machine learning model Light GBM-Cat Boost based on the idea of Boosting integration to achieve accurate and efficient site risk degree with streamlined and easily accessible environmental information data The model provides accurate and efficient prediction of the risk level of a site with streamlined and easily accessible environmental information data,and thus provides information support for decision making on site restoration work.Finally,giving full play to the value and effectiveness of site environmental big data and integrating relevant data and results into the visualization platform will help to comprehensively improve the survey management and remediation governance of contaminated sites.The main results obtained in this paper are as follows:(1)A method for constructing samples for a machine learning model for site contamination prediction was investigated.Based on the relevant standards and specifications of pollution site investigation,a pollution characteristic indicator set has been constructed;the single factor index method,the ground accumulation index method and the potential ecological risk index method were integrated to classify the risk degree of the site into low,medium and high levels.Finally,each characteristic index assignment and risk degree of the site grid parcel are used as the input and output of the prediction model,and the learning samples of the pollution prediction model are completed.(2)The study implements a machine learning model for pollution prediction based on big data of site environment.The three algorithms of XGBoost,Light GBM and Cat Boost are studied in comparison,and a feature reconstruction model Light GBM-Cat Boost with Light GBM and Cat Boost as the core is innovatively proposed.The results show that the method can achieve the prediction of the risk level of contaminated site grid plots with the help of only part of the preliminary survey data.Among them,the best training results were obtained when the input characteristics were the eight items of area,hardened area,production and operation time,sewage,groundwater depth of burial,soil permeability in the saturated zone,subsurface impermeability measures and high-density resistance.The demonstration verification was conducted on a polluted site,and it was found that the recognition accuracy of low,medium,and high-risk land plots on the demonstration site was 80%,72.73%,and 84%,respectively,which has high application value.(3)A visualization platform for contaminated sites were designed and developed.Based on the site environmental big data and other research results in this paper,the contaminated site visualization platform was developed in combination with system development related technologies.The platform collects a variety of visualization forms and designs three modules: home page display,data management and pollution prediction,which visually display contaminated site information,pollutant data,relevant laws and regulations and pollution prediction,etc.It provides reference and assistance for site remediation and treatment,and helps relevant departments realize efficient management of contaminated sites with information and digitalization. |