| Due to human activities and climate change,aquatic ecosystem degradation caused by inland water eutrophication and cyanobacteria bloom events and human drinking water health have become a global hot issue.The methods for detecting cyanobacteria bloom emerge in an endless stream.Phycocyanin(PC)is the signature pigment of cyanobacteria,and its concentration can indicate the size of cyanobacteria biomass,which plays an important role in the detection and early warning of cyanobacteria bloom.However,it is still challenging to accurately estimate the concentration of phycocyanin in inland water by remote sensing technology,due to the large variability of various optically active components and complex optical properties of inland water,as well as the extremely weak optical signal of phycocyanin.To solve this problem,three strategies are adopted in this study.The first is to collect water sample data and synchronous satellite image data of different climate zones,different nutrient levels and different water types as much as possible.Therefore,the research area of this study is set in the vast area east of the Hu Huanyong Line,which includes the three lakes.Phycocyanin concentrations and simultaneous Ocean and Land colorimeter(OLCI)surface reflectance were collected from 640 pairs of 25 lakes in the study area from 2020 to 2022.Second,based on this large sample data set,a remote sensing inversion framework for phycocyanin concentration was developed in this study,which integrated the water optical classification algorithm and three candidate algorithms:baseline height algorithm,band ratio algorithm and three-band algorithm for phycocyanin concentration modeling.In the water optical classification algorithm,the reflectance of four bands Rrs(560),Rrs(620),Rrs(647)and Rrs(709)and their corresponding baseline band reflectance combinations were used to divide the sample data into five types,and each type had specific spectral shape and water quality characteristics.Compared with no optical classification algorithm,this framework improves the accuracy of estimation of phycocyanin concentration,especially in the case of low phycocyanin concentration.Thirdly,considering the high efficiency and reliability of current machine learning,based on this large sample data set,a remote sensing inversion model of phycocyanin concentration was constructed by using multi-band combination and random forest algorithm,which showed good estimation ability.After that,this study used this model to invert the phycocyanin concentration of lakes larger than 20 km2 in the whole study area in 2021,and synchronously invert the phycocyanin concentration of four typical lakes(Chagan Lake,Taihu Lake,Dianchi Lake and Xingkai Lake)from 2017 to 2022.Besides,the annual and monthly mean values of phycocyanin concentration of each lake were calculated.The spatial and temporal variation pattern of phycocyanin concentration in the four lakes and river basins was obtained,and the spatial and temporal variation pattern was combined with environmental factors and human factors.The main factors affecting the spatial and temporal variation of phycocyanin concentration were analyzed by the generalized linear model(GLM),and the contribution rate of each factor to the spatial and temporal variation of phycocyanin concentration was quantified.Specific conclusions are as follows:(1)Based on the correlation and fitting slope between 558 pairs of measured remote sensing reflectance of water surface and OLCI image surface reflectance corrected by ACOLITE(DSF)atmospheric correction method,this study proves that ACOLITE(DSF)atmospheric correction method is reliable and effective in large-scale application of inland water bodies.The results show that,in blue band,green band,red band and near infrared band these four bands of OCLI-ACOLITE water remote sensing reflectance and ground measured reflectance correlation is very good,except for the red band correlation of 0.74,the correlation of the other three bands are greater than or equal to 0.61.Moreover,the slope of band fitting is close to1 except for the near infrared slope.Therefore,the reflectance of OLCI-ACOLITE can be directly used for remote sensing inversion of phycocyanin concentration in lakes;This paper develops an optical classification algorithm for water bodies,which is developed based on water color remote sensing and the principle of inherent optical characteristics combined with spectral shape algorithm.The algorithm classifies water bodies by multi-index decision tree classification.Before classification,FAI index is used to exclude algal bloom scum waters.The baseline height ratio bands of two Chla reflection peaks,560 nm(LH B6)and 709 nm(LH B11),were used to distinguish clean and cloudy water bodies,line height ratio>;0.58 is high turbidity water(high ISM,high algal water),line height ratio<0.58 is a clean water body.In highly turbidity water,the baseline height of the band 674 nm(LH B9)was used to distinguish between high ISM and high algal water bodies.LH B9>0.003,it is considered to be a high turbidity water body with high concentration of inorganic suspended particulate matter and low concentration of algae.0.003,it is considered to be moderately turbid water containing ISM and algae.In this case,LH 620 nm>0.0013,it is considered to be a high phycocyanin water body,on the contrary,it is a low phycocyanin water body.For clean water body,medium algae water body and low algae water body can be distinguished by the size of green peak height,that is,LH 560>0.02,it is considered to be a medium phycocyanin water body,on the contrary,it is a low phycocyanin water body.The threshold of each node(spectral index)in the decision tree is determined by the empirical values of measured ISM,Chla and PC data;Based on the above water optical classification algorithm,the water in the study area was successfully divided into five categories:TypeⅠ(high turbidity water),TypeⅡ(high algae water),TypeⅢ(medium turbidity-low algae water),TypeⅣ(clean-medium algae water)and TypeⅤ(super clean water),and there were significant differences in water quality data among the types.The ISM concentration of TypeⅠwas the highest(58.63±41.54 mg L-1),and the PC:Chla ratio of TypeⅡwas the highest(9.02±6.35),which was a strong evidence of cyanobacteria-dominated water.The PC:Chla value of typeⅢwas 0.27±0.21,while the PC:Chla value of typeⅣwas 0.63±0.70 higher than that of typeⅢ.The concentrations of TSM,Chla and PC of typeⅤwere the lowest among all types.According to PC:Chla value,the risk levels of cyanobacteria were defined in this paper,that is,when the ratios were 9.02,0.63 and 0.27,the risk levels of cyanobacteria were high,medium and low,respectively;(2)The optical classification based remote sensing inversion model of the concentration of phecosin produced a better estimation accuracy,with the overall estimation accuracy of RMSE=72.6μg L-1 and MAPE=80.4%.Especially at low concentration,typeⅤproduced a RMSE of 0.55μg L-1 and MAPE=177%.The accuracy of typeⅣwas RMSE=6.29μg L-1and MAPE=22.2%.This proves that classification modeling is an effective strategy in the case of large spatial scale and extremely complex water types.In this study,the performance capabilities of the three candidate empirical algorithms in five different water types were compared.Among them,the band ratio algorithm had strong universality and was suitable for medium turbidity and clean water,the three-band algorithm was only suitable for medium turbidity water,and the line height algorithm was only suitable for water with high phycocyanin content;When no optical classification was performed,this study compared the ability of band ratio algorithm,multiple linear regression algorithm and multi-band combined-random forest algorithm to estimate the concentration of cyanin.The goodness of fit R2 of band ratio algorithm was 0.61,RMSE was 304.8μg L-1,and MAPE was 73.57%.The goodness of fit of multiple linear regression algorithm R2=0.72,RMSE 233.49μg L-1,and MAPE 48.54%.The multi-band combination Random Forest algorithm has the best performance and the strongest robustness,and the lowest RMSE is 117.1μg L-1,and MAPE is 13.3%.Therefore,in this study,random forest algorithm was selected to construct a remote sensing inversion model of phycocyanin concentration,and the model was used to map phycocyanin concentration of water bodies at the four lakes and watershed scales;(3)Based on the mapping of phycocyanin concentration in water by multi-band combination and random forest algorithm,this study statistically analyzed the spatial and temporal distribution characteristics of phycocyanin concentration at three different scales,and found that there were significant differences in the spatial and temporal distribution pattern of phycocyanin concentration between different lakes and different river basins.At the lake scale,the mean annual phycocyanin concentration was NLR(74.56±62.52μg L-1)<ELR(86.59±84.07μg L-1)<YGR(298.87±133.23μg L-1);There were significant differences in the annual mean concentration of cyanobacterial blue protein in 38 basins,among which Songhua Lake basin(367.48±62.13μg L-1)in Northeast Lake basin.The annual mean of phycocyanin concentration in Hengshui Lake Basin(356.28±72.53μg L-1)and Yungui Lake Basin(441.53±45.27μg L-1)had the highest annual mean of phycocyanin concentration in the eastern lakes.The inversion results of phycocyanin concentration in four typical lakes from 2017 to 2022 based on multi-band combinator-random forest algorithm showed that the monthly mean of phycocyanin concentration in Chagan Lake and Taihu Lake had significant changes,with the highest values in August and May,respectively,while the monthly mean of phycocyanin concentration in Dianchi Lake had no significant changes and was in the high concentration range throughout the year.The annual mean of phycocyanin concentration in dry Lake,Taihu Lake and Dianchi Lake showed significant changes.From 2017 to 2022,the mean of phycocyanin concentration in Chagan Lake showed an increasing trend,while the mean of phycocyanin concentration in Taihu Lake and Dianchi Lake showed a decreasing trend;(4)The influencing factors of phycocyanin concentration variation were significantly different between different lakes and different basins.For the northeast lake area,rainfall was the dominant factor affecting phycocyanin concentration.In the eastern lake area,wind speed was the dominant factor affecting phycocyanin concentration.In the Yungui Lake area,air pressure and wind speed were the dominant factors affecting phycocyanin concentration.The largest contribution factors to the variation of phycocyanin concentration in Chagan Lake Basin were wind direction and wind speed and impervious layer,which contributed 12.55%and 12.47%respectively,followed by the contribution rate of rainfall,which was 11.11%.Temperature was the biggest factor contributing to phycocyanin concentration variation in Taihu Basin,with a contribution rate of 18.64%,followed by NDVI with16.84%,followed by rainfall with 9.17%,and population density with 6.24%.The biggest contributing factor to the variation of phycocyanin concentration in Dianchi Lake Basin was NDVI,with the contribution rate of 18.6%,followed by forest contribution rate of 9.66%,wind speed contribution rate of 9.13%,and impervious layer of 7.43%.Population density and the proportion of impervious layer contributed14.97%and 13.53%to the variation of phycocyanin concentration in Xingkai Lake Basin,followed by the proportion of arable land contributing 13.67%and the proportion of rainfall contributing 8.71%,indicating that the variation of phycocyanin concentration in Xingkai Lake Basin was dominated by human activities.The results showed that the variation of phycocyanin concentration in inland lakes was influenced by both natural factors and human activities,among which the natural climate conditions in the first three typical basins were more important,and the human activities in the last basin were more important;In summary,this paper proposed remote sensing estimation of phycocyanin concentration in inland water from classification modeling and non-classification modeling respectively,which has reference value for optical classification of water bodies.The classification modeling framework proposed in this paper solved the estimation accuracy of phycocyanin in optically complex waters,and provided a new idea for water quality inversion in inland waters.At the same time,the PC:Chla method proposed in this paper provides reference for the evaluation of risk classification of cyanobacteria,and the contribution rate analysis of phycocyanin influencing factors in this paper provides a theoretical basis for controlling cyanobacteria bloom. |