Font Size: a A A

Machine Learning Methods And Applications Based On Statistics And Graph Model

Posted on:2013-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J B YinFull Text:PDF
GTID:2218330362459207Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
This paper is focus on some typical problems on machine learning, we construct the mathematical model from these problems and propose novel solutions to sovle these problems and apply these methods on real fields. The innovations of this paper are listed as follow:Propose a novel approach to optimize the kernel function for the kernel-based methods. Gaussian kernel function implicitly defines the feature space of an algorithm and plays an essential role in the application of kernel methods. The parameter of Gaussian kernel function is a scalar that has significant influences on final results. However, until now, it is still unclear how to choose an optimal kernel parameter. In this paper, we propose a novel data-driven method to optimize Gaussian kernel parameter, which only depends on the original dataset distribution and yields a simple solution to this complex problem. The proposed method is task irrelevant and can be used in any Gaussian kernel based approach, including supervised and unsupervised machine learning. Simulation experiments demonstrate the efficacy of the obtained results. A user-friendly online calculator is implemented at: www.csbio.sjtu.edu.cn/bioinf/kernel/ for public use.Propose a robust ART-2 neural network learning framework. The ART-2 network is a typical adaptive resonance theory (ART) based neural network approach and has been successfully used in many fields. However, one of the shortcomings of traditional ART-2 is that its final results heavily depend on a pre-defined vigilance threshold parameter, which is used to measure the similarity between samples and the categories. Another disadvantage of traditional ART-2 method is that the number of categories in the network will increase all the time with the continuous input. Considering these points, an improved algorithm of ART2 has been presented in this paper. In this study, we first systematically analyze the dynamic changes of the optimal vigilance threshold with the succession inputs and propose a new adaptive method to make the network itself can automatically choose the optimal threshold. Then we introduce a constraint parameter to confine the scale of ART-2 network by limiting the maximal number of categories of network. Simulation experiments including artificial and benchmark data sets demonstrate the effectiveness of our algorithm.Propose a novel approach for multiple kernel subclass discriminant analysis with normalized cuts. Many discriminant analysis (DA) algorithms have been proposed for the study of high-dimensional data in various machine learning tasks such as dimensionality reduction, classification, and regression. Subclass discriminant analysis (SDA) and its variation of kernel subclass discriminant analysis (KSDA) are proven successful for different types of class distributions and have been widely used because of their high performances over many other DA methods. One critical problem of traditional SDA (KSDA) is that the method for dividing the subclasses is restricted in many cases. In this paper, we introduce the normalized cut (Ncut) solution to detect the optimal partition for each class. In addition, we incorporate multiple kernel learning over a nonlinear combination of kernel functions into the training process of SDA (KSDA) methods. The proposed method (termed as MKSDA) distinguishes itself with the following two main contributions. First, we find a more effective way to divide each class into subclasses for SDA (KSDA). Second, our proposed multiple kernel functions can extract more useful features of various aspects of the underlying data than a single kernel function does. Experimental results demonstrate the effectiveness of the proposed method. Propose a novel conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifiers. Conotoxins are disulfide-rich small peptides that are invaluable channel-targeted peptides and target neuronal receptors. They show prospects for being potent pharmaceuticals in the treatment of Alzheimer's disease, Parkinson's disease, and epilepsy. Accurate prediction of conotoxin superfamily would have many important applications towards the understanding of its biological and pharmacological functions. In this study, a novel method, named dHKNN, is developed to predict conotoxin superfamily. Firstly, we extract the protein's sequential features composed of physicochemical properties, evolutionary information, predicted secondary structures and amino acid composition. Then we use the diffusion maps for dimensionality reduction, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set in order to obtain efficient representation of data geometric descriptions. Based on that, an improved K-local hyperplane distance nearest neighbor subspace classifier method called dHKNN is proposed for predicting conotoxin superfamilies by considering the local density information in the diffusion space. An overall accuracy of 91.90% is obtained through the jackknife cross-validation test on a benchmark dataset, indicating the proposed dHKNN is very promising.Propose a novel dimensionality reduction method based on hypergraph and markov random walk. We consider the problem of dimensionality reduction and present a novel probabilistic latent variable model named DrHM based on a suitable time-expanded directed hypergraph. We consider the information of both nearest neighbor information and global intrinsic geometry of a given data set. Based on hypergraph theory, the complex relationships of a sample can be represented by hyper-vertices and hyper-edges. Our model, which is based on a Markov random walk on the data, also can reflect the construction and inner connectivity of a given data set. The proposed method provides a new definition of similarity measures between two samples. The performance of the model is then demonstrated on a serial of images data sets for supervised learning and unsupervised learning.
Keywords/Search Tags:Statistics, Graph model, Machine learning, Gaussian kernel, ART network, MKSDA, Conotoxin superfamily, Markov random walk
PDF Full Text Request
Related items