Font Size: a A A

The Research On Model Selection And Model Average

Posted on:2015-09-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L ZhuFull Text:PDF
GTID:1220330485494827Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In recent years, model selection has showed its increasing importance with the in-depth study of high-dimensional problems, but the prediction risk of the "optimal model" can not be ignored considering the model selection uncertainty. It thus becomes a hot academic topic to reduce the model selection uncertainty. Model average is a classical method designed to decrease the model selection uncertain-ty and prediction errors, but when implementing it, the following problems are frequently encountered:1, which kinds of individual models should be used for combining; 2, how to effectively use the historical data to find the model average estimate; 3, how to interpret the model average estimate.In regard to these problems, four research topics on model selection and model average were studied in this paper. Firstly, we designed the ECS-LBM individual models selection procedure for model average under the normal linear assumption. The notations of exact confident set(ECS) and lower boundary models(LBMs) based on F test were introduced, and then the ECS-LBM method and ECS-LBM-ARM model were built. We proved that the number of ECS-LBM models is limited and the prediction risk of ECS-LBM-ARM is bounded by some constants and the best estimator. For a fixed significant level a, ECS-LBM models covered the true model in probability 1-α. After the ECS-LBM model screening, the ARM approximated to the best estimator at a faster speed. Specifically, the prediction error of ECS-LBM-ARM was obviously better when the true model is the best model. Overall, ECS-LBM-ARM, which provided better prediction accuracy and less computation time, is superior to the popular model selection methods, such as LASSO, SCAD and MCP.Secondly, we discussed the applications and prediction risk of ARM with vary-ing weights for time series. In order to find a solution to individual model selection, we proposed the sequential selection and perturbation selection methods. Our the-oretical results showed that the robust model average estimator has lower penalty log(Ks) compared with the original log(K), so the model average approached to the best estimator at a faster speed. Based on data, we proposed prediction error ratio for comparing the rolling ARM and greedy ARM, which can be used for window width selection, too. The sequential selection and perturbation selection methods work well for both simulations and real data analysis. The simulation results and real data showed that the rolling model average is better than the greedy one.Thirdly, under the linear assumption, we proved that the model average es-timator was a linear estimator. The coefficients were not from any fixed estimate method, but from the average of individual models. Taking the robust ARM as the representative, we proved that linear estimator based on model average had bounded prediction risk. Therefore, the estimator based model average had the advantages of stability and higher prediction accuracy. Many simulations showed the effectiveness of our proposed estimator with the following features:1, the LSE showed bigger bias than model average estimator when increasing the correlation or error; 2, the robust model average worked well for non-normal error estima-tion:3, whether true model was in the individual model set, the model average performed best among all models.Finally, we discussed the model selection for high-dimensional regressions. The simulations and real data results indicated the big model selection uncertain-ty in high-dimensional models and the high effectiveness of model average. From the results of real data, we could not solve the model selection and prediction for high-dimensional regressions unless the variables selection bias was found. Addi-tionally, we discussed the relationships between the "optimal mode" and the model average, regression and model average,and proposed one relaxed hybrid model average based on the discussion about season and trend separation in time series forecasting.
Keywords/Search Tags:Model selection uncertainty, model average, model screening, high-dimensional regression, time series forecasting, prediction
PDF Full Text Request
Related items