| Wheat flour is an important food raw material in daily life.The quality of wheat flour affects its specific use in food production.Farinogram property is one of the important properties to measure the quality of wheat flour.The four most important parameters of farinogram are water absorption,development time,stability time and degree of softening.Although the traditional farinograph can measure farinograph,it needs to pay more time and labor costs,and is not suitable for monitoring wheat flour production.Near infrared spectroscopy is a fast and convenient detection method,which can give real-time measurement results.Aiming at the problem of how to effectively measure wheat flour farinograph properties by near infrared spectroscopy,a variety of NIR regression models were established.The main work content is as follows:Firstly,the background and significance of the study were summarized from the point of view of near-infrared spectroscopy and wheat flour properties,and the principle of nearinfrared spectroscopy analysis technology and the process of data acquisition were briefly introduced.According to the collected data,two linear regression models of principal component regression(PCR)and partial least squares regression(PLSR)were established.the main data characteristics were extracted by reducing the dimensionality of near-infrared spectral data,and then the low-dimensional data were used to establish multiple linear regression models.Four kinds of pretreatment were studied and compared,and the optimal methods were selected for the models of four farinograph properties,respectively.At the same time,the optimal PCA principal component number and PLS principal component number were selected for the input data through the learning curve of root mean square error(RMSE)and noise level.From the final results,the predictive ability of PCR model was comparable to that of PLSR model,but the explanation ability of PLS component was stronger than that of PCA component.Subsequently,based on the above two linear regression models,the study introduces the method of unsupervised clustering in regression modeling.K-means clustering algorithm was used to analyze the first few features extracted by PCA and PLS,which contained the main spectral information,and the data set was divided into three clusters,and two linear regression models of PCR and PLSR were established on each cluster.Because PCA and PLS are both linear dimensionality reduction methods,we also introduced T-distribution random neighbor embedding(TSNE),a nonlinear dimensionality reduction method,for cluster analysis and comparison,and established three regression models,namely K-means-PCR model,K-meansPLSR model and TSNE-K-means-PLSR model.From the results of the final model,the three methods all showed better predictive ability than the original PCR model and PLSR model,and the effect of TSNE in cluster analysis was more significant.Finally,the study introduces the method of supervised classification in regression modeling and establishes a regression model based on supervised classification.The predicted values of the regression model were obtained by stepwise regression,and the Gaussian process regression(GPR)method was used for the initial regression of farinogram properties.The predicted values of GPR were used as the numerical standard for classification,and then the threshold values of farinogram properties were selected,and the sample data were divided into two clusters,which were larger than the threshold value and smaller than the threshold value,in which the PLSR models were established respectively.Then the concept of fuzzy classification is proposed,and the final prediction result of the model is considered as the combination of the prediction results of two PLSR models and the probability weight,which is called GPR-PLSR model.The prediction effect of GPR-PLSR model was not only better than the original PLSR model,but also better than the three regression methods with unsupervised clustering. |