Since the 20 th century,near-infrared spectroscopy analysis technology has fallen into a state of stagnation.Until the birth of chemometrics and its integration with near-infrared spectroscopy technology,qualitative and quantitative analysis of relevant elements in the measured materials can be carried out,thus providing great convenience for human research,life and production.However,due to the few sharp and separated peaks in the near-infrared line spectrum,there is a problem of information overlap between spectra.Therefore,we need to integrate multiple data processing techniques to better grasp the essential features of the near-infrared spectrum.At present,the data denoising technology of near-infrared spectroscopy and data dimensionality reduction technologies such as principal component analysis(PCA)and linear discriminant analysis(LDA)have developed relatively mature,but there are also many problems.In practical applications,due to the combined influence of multiple factors,some important datasets require limited information,which affects the final classification effect.In response to the above issues,this article proposes a new method to study the denoising and dimensionality reduction methods of near-infrared spectroscopy,and conducts a series of experiments.1.Analysis of near-infrared spectral data denoising processing.50 samples of pure naked oats flour,50 samples of naked oats flour mixed with 10% corn starch,20% corn starch,30% corn starch,40% corn starch,and 50% corn starch(represented by the six categories in 1-6 order),were used as the research object.A total of 300 samples were collected using near-infrared spectroscopy.By using multiple denoising combination methods to denoise the sample spectral data,on the premise of using BP neural network algorithm for classification,the final classification accuracy can be obtained.The spectral data obtained from the first derivative,second derivative and multivariate scattering correction of the traditional data denoising algorithm,after SG smoothing,the accuracy of qualitative analysis has been slightly improved,respectively,It can be obtained that the data obtained after using the multivariate scattering correction algorithm and SG smoothing has the highest classification accuracy.2.After denoising by combination denoising method,use k-means improved k-fold cross validation method to remove outlier from the data and compare the final classification accuracy.The accuracy of the final qualitative analysis of the data obtained after removing outlier by the combined method denoising plus k-means improved k-fold cross validation method has been further improved,and the qualitative analysis ability of the algorithm has been significantly improved.3.Compare the final classification accuracy of spectral data obtained through various dimensionality reduction algorithms after processing steps 1 and 2.After denoising and removing outlier from the spectrum,PCA,LDA and PCA+LDA algorithms are used to reduce dimensions respectively.According to the results,the classification accuracy of PCA algorithm after dimension reduction is relatively satisfactory.The accuracy of LDA algorithm after dimension reduction is similar to PCA,while the model accuracy of PCA+LDA improved algorithm has been greatly improved,and the classification effect is the best,It can be concluded that the classification accuracy rate of the data obtained by multivariate scattering correction SG smooth denoising after removing outlier by k-means improved k-fold cross validation method,and then using PCA+LDA improved algorithm to reduce dimensions into BP neural network data is the highest.The research results of this article will open up new avenues for high-dimensional information processing in near-infrared spectroscopy analysis methods,which have important theoretical significance and practical value. |