Font Size: a A A

Some Applications Of Sparse Optimization In Machine Learning

Posted on:2014-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J LiangFull Text:PDF
GTID:1268330425977300Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
In recent years, exploring the sparsity of the solutions and other special structure has become a common issue in many computational and engineering areas. Sparsity is much broader than "having few nonzero elements". It roughly means "having a simple structure". In this thesis, we utilize the sparsity structures to construct models for various practical problems in machine learning and improve the corresponding classical algorithms to solve them when necessary. The main work can be summarized as follows.1. In Chapter2, we present the proposed sparse optimization models and algorithms for various practical problems in machine learning. All the models have a common abstract struc-ture, i.e., minimizing a loss functional over a hypothesis space with a certain simple or special structure. The proposed box-constrained Lasso model and Block PCA model possess this struc-ture. We present the Splitting algorithm for solving the Block PCA model and the Homotopy algorithm for solving the Lasso model with box constraints in this chapter.2. In Chapter3we analyze the convergence of the Homotopy algorithm for solving the box-constrained Lasso model and evaluate its numerical performance. It is shown that the con-vergence is not as apparent as it looks. Under a non-degeneration index assumption and some other weak conditions, it is proved that Homotopy terminates and finds an optimal solution after finite iterations. Moreover, the issue of degeneration and recursion is discussed. Although there exist many algorithms for solving this model, the numerical experiments have shown the special advantage of the proposed algorithm, and illustrated that the Homotopy algorithm is particularly useful when the underlying solution is quite sparse or the entire regularization path is required. This is the key technique for the calculation of measuring the prediction capability of data for collaborative filtering in Chapter4.3. Chapter4is devoted to the prediction capability of data for collaborative filtering. Most work of collaborative filtering has focused on improving the performance of the algorithms. In this chapter we point out that part of unknown ratings could not be accurately predicted, limited by the information of known ratings. We propose a metric,"relatedness", to measure the potential that a user’s preference on an item can be accurately predicted. The relatedness of a user-item pair is determined by a community which consists of users and items related to the pair. As an application of the relatedness metric, we develop the Data-Oriented Combination (DOC) method for recommender systems. 4. Chapter5is devoted to the problem of identifying gene regulatory network (GRN) from time course gene expression data. Due to the computational complexity, most approaches for GRN reconstruction can only identify a single network of low connectivity for a given set of genes. We propose the network and community identification (NCI) method for identifying multiple sub-networks from gene expression data by incorporating community structure infor-mation into GRN inference. The Block PCA model can search communities effeciently in GRNs by employing the Splitting algorithm proposed in Chapter2.5. Chapter6is devoted to the peptide identification problem which is the key procedure for protein identification. The sequence database searching has been the dominant method for peptide identification. However, lots of the peptide spectrum matches (PSMs) output by the searching engine are not correct. Based on supervised or semi-supervised learning, most of the current methods make use of the decoy PSMs fully, but the information of target PSM samples themselves has not been explored entirely. A novel scoring method named FC-Ranker is de-veloped by assigning a nonnegative weight to each target PSM based on the possibility of its correctness. Particularly, the weights of PSMs are updated by using a fuzzy SVM classification model and a fuzzy Silhouette index iteratively. In terms of ROC and the number of identified PSMs under common FDR level, FC-Ranker outperforms other popular post-database search algorithms.
Keywords/Search Tags:Nonsmooth Optimization, Sparse Optimization, Collaborative Filtering, Lasso, Homotopy Algorothm, Gene Regularization Network, Peptide Identification
PDF Full Text Request
Related items