PACBayes risk bound integrating theories of Bayesian paradigm and structure risk minimization for stochastic classifiers has provided a framework for machine learning algorithms and derived some of the tightest generalization bounds. The effectiveness and correctness of the PACBayes theory are deduced by the Probably Approximately Correct model and Bayesian decision theory. PACBayes bound is the important statistical factor to measure the generalization performance of machine learning algorithms, and has the strict mathematical expression and a general meaning.This thesis applied the PACBayes risk bound for assessing the generalization performance of SVM. First of all, the open test and close test are built by five UCI data sets, the PACBayes bound and the statistical factors are calculated, including sensitivity, specificity and accuracy. By analyzing the covariance and correlation coefficient between the PACBayes bound and related statistical factors, the experimental results demonstrate that the PACBayes bound has a high negative correlation with the accuracy and a certain negative correlation with specificity and sensitivity. Secondly, as the method of assessing the performances of model, PACBayes bound is compared with the Nfold CrossValidation. Their results are highly consistent and show that PACBayes bound can reflect the generalization risk bound perfectly. Furthermore, PACBayes bound has been applied to the model selection of SVM to select the best penalty parameters and kernel parameters rapidly. Finally, SVM and PACBayes bound are used to structural prediction of protein.A major issue in practical use of PACBayes bound is estimations of unknown prior and posterior distributions of the concept space. In this thesis, by formulating the concept space as Reproducing Kernel Hilbert Space(RKHS) using the kernel method, we propose the random sampling method and Markov Chain Monte Carlo(MCMC) sampling method for simulating sampling the posterior distributions of the concept space, and realize the calculation of KullbackLeibler divergence and PACBayes bound. Furthermore, we propose the variance minimization method to investigate the statistical significance of the support vectors, and optimize the support vectors and their weight vectors. The experimental results on two artificial data sets show that the method of simulation is reasonable and effective in practice.Based on formulating the concept space as Reproducing Kernel Hilbert Space(RKHS), we propose a refined Markov Chain Monte Carlo(MCMC) sampling algorithm by incorporating feedback information of the model to simulate the sampling posterior distributions of the concept space. Furthermore, we used a kernel density estimation method to estimate the probability density of posterior distributions for the calculation of the KullbackLeibler divergence of the posterior and prior distributions, and then solve the calculation problem of PACBayes bound. Finally, we use the random sampling method, MCMC sampling method and refined MCMC method respectively, and the experimental results show that the method improved the calculation of PACBayes bound.
