Font Size: a A A

Research And Application Of Large-scale Machine Learning Theory

Posted on:2013-07-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J ZhangFull Text:PDF
GTID:1228330395989253Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology and especially the widespread applications of Internet, various types of data are increasing rapidly. It has become a public difficulty to discover useful knowledge from the vast ocean of data, which has attracted much attention from both the academic and industrial fields in the last decade. For the purpose of handling large-scale data, we have studied and explored four typical large-scale machine learning technologies, i.e., active learning, feature selection, co-clustering and online learning. Specifically, we have put forward a series of novel machine learning algorithms, which have been successfully applied to many important areas. The main contributions are stated below.To reduce the labeling cost, we propose a novel active learning algorithm which selects the most representative points with respect to the intrinsic manifold structure of the data. We assume that each sample and its neighbors lie on or close to a locally iinear patch of the manifold. Then, the manifold structure is characterized by the linear coefficients that reconstruct each sample from its neighbors. A transductive learning algorithm called Locally Linear Reconstruction (LLR) is proposed to reconstruct the whole data set by using the given local reconstruction coefficients for every sample and the coordinates of the selected samples. The most representative samples are therefore defined as those whose coordinates can be used to best reconstruct the whole data set.For the sake of dimensionality reduction, we propose a novel unsupervised algorithm named Discriminative Feature Selection (DFS). Motivated from recent studies on discriminative clustering, the central idea of DFS is to select those features so that the cluster structure can be best respected. DFS fits a linear function to model the relationship between the data matrix after feature selection and the indicator matrix. The resulting fitting error can be written in closed form, and is dependent on the selected features and the indicator matrix. The most discriminative features are thus defined as those which lead to minimal fitting error. To explore the relation between different types of data, we propose a novel co-clustering algorithm named Locally Discriminative Co-Clustering (LDCC) to cluster samples and features simultaneously. LDCC constructs a bipartite graph between samples and features to model the sample-feature relationship, and requires the clustering result to be smooth with respect to the graph. Based on the local linear regression, LDCC is able to discover the intrinsic discriminative structures of both sample and feature spaces. The inter-sample and inter-feature relationships are captured by minimizing the fitting errors of all the local linear functions. In this way, LDCC groups strongly associated samples and features together, while respecting the local structures of both sample and feature spaces.To reduce the computational cost of kernel learning, we apply online learning to optimize kernel logistic regression (KLR). A straightforward approach is to apply stochastic approximation to KLR. We refer to this approach as non-conservative online learning algorithm because it updates the kernel classifier for every received training example, leading to a dense classifier. To improve the sparsity of the KLR classifier, we further propose two conservative online learning algorithms that update the classifier in a stochastic manner and generate sparse solutions. With appropriately designed updating strategies, the conservative algorithms tend to update the classifier only when the loss is large. Our analysis shows that the two conservative algorithms enjoy similar theoretical guarantee as that of the non-conservative algorithm.To solve the associated optimization problems efficiently, we make use of many optimization techniques, such as greedy approach, spectral analysis, convex relaxation, and stochastic gradient descent, to reduce the computational cost and thus improving the scalability. In the experiments, we apply our algorithms to the real-world problems of face recognition, image classification, image codeword selection, text and gene co-clustering and large-scale online classification. Experimental results show that our proposed algorithms are superior to the state-of-the-art methods.
Keywords/Search Tags:Active Learning, Feature Selection, Co-clustering, Online Learning
PDF Full Text Request
Related items