| Pine Wood Nematode(Bursaphelenchus xylophilus(Steiner&Buhrer)Nicher,PWN)is the pathogen of Pine Wilt Disease(PWD),which is a widespread and deadly threat to Pinaceae Pinus.It mainly affects species such as P.massoniana,P.thunbergii,P.yunnanensis,P.tabulaeformis and P.armandii in China,causing continuous and serious ecological and economic damage to the forestry in China.Remote sensing images have the potential to monitor and identify PWD because of the ability to rapidly acquire forest information over large scale.However,the severe lethality of PWD and the concealment of early disease signs lead to the difficulty of monitoring by ordinary optical remote sensing images and the presence of undetected infected wood.Hyperspectral remote sensing is capable of probing fine spectral information,which helps to diagnose PWD earlier.Nonetheless,the excessive dimensions of spectral signals and the noise problem caused by narrow wavelength channels impose complex processing and modeling procedures of hyperspectral data.In order to explore the preprocessing standard for PWD hyperspectral monitoring,to find spectral sensitive bands for diagnosing PWD,and to propose highly accurate and interpretable spectral variables and models for early diagnosis.In this paper,under the condition of artificial inoculation of PWN,6 stages of needle spectra of infected trees were continuously observed,a combined hyperspectral preprocessing procedure was implemented,a hyperspectral feature selection framework for PWD was established,and a predicting and diagnosing model was constructed.The main research results and conclusions are as follows:(1)Two matrix decomposition algorithms for feature extraction were improved by transforming them into feature selection algorithms for PWD hyperspectral data.The feature extraction algorithms principal component analysis(PCA)and non-negative matrix decomposition(NMF)were improved into 6 feature selection algorithms based on summation(SUM)strategy and probability(PROB)strategy,which included variance summation importance(VA),standard deviation summation importance(SD),variance probability importance(VP),standard deviation probability importance(SP)and joint probability importance(JP).The mapping relationship between the improved PCA indices and the original PCA method is proved.The improved methods were evaluated using the Sequential Feature Selection(SFS)method,and it was shown that the high-weighted methods outperformed the low-weighted or unweighted methods(VP>SP>JP,VA>SD),and the PROB strategy-based methods outperformed the SUMbased methods.(2)A feature selection framework with high generalizability and excellent feature reduction performance was established to obtain a spectral feature set with high validation accuracy.The feature selection framework was constructed based on the sequence feature selection(SFS)algorithm,and the effects of spectral resolution and filtering order(Interp,SG rank)on the feature selection performance were evaluated while 6 improved feature selection algorithms were compared with 3 classical algorithms.The accuracy of different feature selection algorithms decreases to different degrees when the spectral resolution is higher than 5 nm.Radj2 was optimized on the 4 better feature selection algorithms,and the number of features was further reduced using the successive projection algorithm(SPA)while the validating accuracy was improved as well,and finally 4 subsets of 15,17,25 and 29 features respectively were obtained,the corresponding accuracies are Radj2=0.837,0.832,0.822,0.826.The wavelength and spatial distribution of the features selected by the 4 selection algorithms varied,and the PCA-VP algorithm with the best overall performance tended to select the peak-valley features of the differential spectrum.(3)The optimal combination of feature sets and fitting models was found to predict the number of days after infection.The training and predicting accuracy of the 4 statistical and machine learning models is best for partial least squares regression(PLSR),second best for single layer perceptron(SLP),and last for support vector machine regression(SVR)and random forest regression(RFR).The accuracy changing from train sets to test sets was observed to present similar patterns.The MIC feature subset with the most features(p=29)constructed the model with the highest test accuracy(RSVR2=0.786,RRFR2=0.788).The PCA-VP&PLSR(Rtest2=0.759,p=15)was pick as the definitive combination,followed by the extraction of regression coefficients of PLSR.(4)Based on the combination of the optimal feature set and the predicting model,sensitive spectral bands were extracted and the diagnostic model was validated using statistical tests.Based on the absolute values of the regression coefficients of PSLR,the"infrared" band from 1240 to 1340 nm was determined to be the primary band for diagnose,and the "red" band from 640 to 800 nm was determined to be the secondary one,followed by the narrow band with the central wavelength at 1780 nm to be the supplementary one.A two-sample t-test was performed on the estimated days-afterinfection calculated from the 6 experimental samples with different days-after-infection and the corresponding 3 control samples,and the results showed that the PLSR model was effective in diagnosing pine nematode disease after 22 days at the 0.05 level of significance. |