Machine learning tries to infer rules according to data observed from natural phenomena or special experiments. Statistical learning theory assumes that the data obey a specific but unknown distribution. Without estimating the distribution, statistical learning theory studies the combination of two objective functions:training databased and distributionbased. Kernel methods are used to solve nonlinear problems with linear technique by nonlinear map, avoiding explicit description of the nonlinear mapping process. Following the idea of controlling structural risk in statistical learning theory, probably approximately correct learning model gives reliable theoretical supports for ensemble learning methods such as Boosting, Bagging, and so on. Other learning machines such as principal component analysis benefit from kernel methods.Summarizing the features of statistical learning theory and support vector machines, it is easy to find the way of support vectord machines'success:effectively controling the structural risk and perfectly using kernel methods. Guided by statistical learning theory and kernel methods, this thesis mainly focuses on improving kernel learning machines such as support vector machine, kernel principal component analysis, kernel principal angle and the ensemble methods.The main work of this thesis can be summarized as follows:1. After a brief introduction of the basic concepts about machine learning, statistical learning theory and support vector machines are reviewed. Probably approximately correct learning model is viewed as a development of statistical learning theory and the theoretical basis of ensemble learning, so it is introduced individually. In order to facilitate the spread of the thesis, the first chapter presents the basics of kernel function, principal component analysis, canonical correlation analysis, and so on. Architectural view of this thesis is also provided.2. With the concepts of irrelevant optimal vector and promising support vector and the application of optimal theorems on convex hull, improved geometrical training algorithm of support vector machine is strictly derived. In the improved algorithm, samples are divided into two classes:irrelevant optimal vectors and promising support vectors. In linear separable case, the irrelevant optimal vectors do not affect the training results of support vector machine and hence can be removed from the training samples. In this way, the number of training samples is greatly reduced and the solving process of support vector machine is simplified. In linear inseparable case, using the concept of soft convex hull, the two linear inseparable sets can be transformed into linear separable sets. Thus the solving process of support vector machines is simplified in the same way.3. Sparse least squares support vector machine is proposed for both regression and classification problems of large sample datasets. The samples are mapped into infinite Reproducing Kernel Hilbert Space (RKHS) and span a subspace. The approximate basis of the subspace is then founded. The approximate basis can represent all the samples linearly. By representing the samples with the linear combination of the elements in the approximate basis, one can get least squares support vector machine by solving a small equations set. This method can reduce the dimension of the kernel matrix and the solution is sparse.4. To solve the classification and regression problems of large datasets, two ensemble kernel methods are proposed. All the samples are mapped into infinite Reproducing Kernel Hilbert Space and several groups of approximate basis of the space are founded. Each subspace spanned by a group of basis can be used as a solution space to train least squares support vector machine. At last, all the support vector machines are integrated. The integrated learning machine is robust to the parameters.5. Ensemble sparse kernel principal component analysis is proposed. Samples are mapped into infinite Reproducing Kernel Hilbert Space and several groups of approximate basis of the infinite space are founded. Each subspace spanned by a group of approximate basis can be used as a solution space to solve eigenvector and eigenvalue problem. Ensemble sparse kernel principal component analysis gets the average of the eigenvectors and eigenvalues in different subspace. As a method to solve TennesseeEastman problem, the experiments show that the proposed method is considerably better than common kernel principal component analysis.6. Ensemble sparse kernel principal angle is proposed for fault detection. The proposed method is one of the few learning machines that can be used to recognise the pattern of faults in the TennesseeEastman process.Finally, a brief conclusion is summarized and some possible directions are pointed out.
