In many disciplines such as Biomedicine,Food Science,Geoscience and Analytical Chemistry,massive amounts of data are generated due to the extensive use of modern instruments like nuclear magnetic resonance apparatus and high-throughput spectrometer.There is a large amount of useful information hidden behind these data.Quantitative or qualitative analysis of these data helps to uncover broad and in-depth corresponding scientific conclusions.Quantitative or qualitative analysis of data such as classification or regression is generally called supervised learning in statistical learning.These methods often need to perform two tasks.(a)Choose the best combination of learning algorithms and adjust their hyperparameters,also known as model selection.(b)Provide performance estimates for the final report model.Bootstrap,Jackknife,and cross-validation methods are commonly used to select the optimal model(or hyperparameters)and evaluate the model performance.Each of these methods has its own advantages and disadvantages.For example,there is an optimistic bias in estimating the true error in cross-validation,that is,underestimation of true error.Many scholars have noticed this problem and proposed different correction methods,such as Nested cross-validation,which can provide a relatively good estimate but cost expensive computation.In order to reduce the computation,the Tibshiranis propose a TT method.Because the TT method is simple and intuitive,many scholars pay attention to it,but some scholars think the TT method overestimates the true error.Therefore,this paper proposes two improved methods based on the TT method: the improved method based on one-half TTBias and the improved TT method based on the median.In this paper,the proposed two improved TT methods and TT method are applied to the modeling experiments of partial least squares model and kernel partial least squares model in high-dimensional data.The empirical results of synthetic and real data sets show that the two new methods proposed in this paper not only correct the optimistic bias of the cross-validation estimation,but also avoid the excessive results of the TT method,thus indicating that the proposed method in this paper is improved. |