Font Size: a A A

The Balanced M×2Cross-validation Method

Posted on:2014-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:W J DuFull Text:PDF
GTID:2250330401962302Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Estimating expected prediction error of statistical model is one of the core tasks of statistical machine learning. The performance of expected prediction error estimation has a significant impact on model selection, test of significance of different model prediction errors and variable selection. To find a good estimation, researchers proposed a large number of estimating methods, such as MDL, Cross-validation (CV), Bootstrap and.632Bootstrap. Cross-validation is a widely used generalization error estimating method.This paper presents an analysis and summary of the advantages and disadvantages of existed CV methods on machine learning classification tasks, and improves the CV method on this basis. Although the Blocked3x2CV has the characteristics of low fold number, balanced dataset segmentation and small number of experiments,5x2CV and10x2CV also have good effect on classification. Furthermore, for the2-fold CV, the experimental results will be improved to some extent with the increase of frequency. But how to solve the contradiction between the performance of experiments and experimental expense is always a difficult issue in the field of machine learning. We promote and improve the Blocked3x2CV, advanced Balanced7x2CV and Balanced11x2CV. Based on these results, we sum up more generic Balanced mx2CV, at the same time propose the construction method of Balanced mx2CV. We determine the experiments times with the Balanced mx2CV, which improves the experiment performance, then support our opinion with theoretical analysis and simulation experiments.We use the Balanced mx2CV method on classification model selection tasks. Considering various influence factors of the Balanced mx2CV model selection methods, we can see that the Balanced mx2CV model selection methods is better than5-fold CV and10-fold CV. At the same time, it has been proved that the Balanced mx2CV has the same consistency of selection as the standard CV. Thus, it states that taking the Balanced m×2CV on the classification tasks is more appropriate than Random m×2CV.
Keywords/Search Tags:Cross-validation, Balanced m×2cross-validation, Model selection
PDF Full Text Request
Related items