Research On Nonconvex Classification-based Algorithm For Peptide Identification

Posted on:2019-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Wang

Full Text:PDF

GTID:2370330620964855

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

The task of identifying the correct matches from a number of peptide spectrum matches(PSMs)presented in post-database searching is called Peptide identification.It is a key procedure for peptide spectrum matches in protein identification.Although many approaches have been developed to improve the accuracy of peptide identification,It is an important research topic to design efficient algorithm for peptide identification due to lots of the peptide spectrum matches(PSMs)output by the searching engine are not correct.A kernel-based classification method CRanker has shown its effectiveness and efficiency in terms of the number of identified PSMs.However,it has two weaknesses: overfitting and instability on small-sized datasets.In this paper,a modified CRanker method and efficient algorithm for peptide spectrum matches are proposed to tackle weaknesses.Like a standard SVM classifier,CRanker uses a single loss function and weight parameters on all PSM data samples.Most of target PSM labels are not correct,this is an important reason of overfitting problem on small-sized datasets.Chapter 2 modified CRanker method by employ different weight parameters to decoy and target PSMs respectively and analyzing the function of model parameters.Instability on small-sized datasets comes from the non-convex optimization formulation where existing optimization problem solvers easily terminate at bad local optimum,especially on small datasets.Chapter 3 propose a new method,called Self-Paced Learning CRanker(SPLCRanker).We replace the standard training in CRanker with self-paced learning(SPL),which starts training from comparatively confident PSMs,and then iteratively adds more targets that are complex during training process by increasing the SPL parameter.The SPL-CRanker is more stable and outperforms peptide identification algorithms in terms of the number of correct PSMs and ROC under common FDR level.Chapter 4 convert the modified CRanker into difference of convex programming and solve it by the convex-concave procedure algorithm.Experimental studies show the new method outperforms benchmark post-database search algorithms in terms of the number of correct PSMs and has good generalization performance.The works in this chapter lay a theoretical foundation for designing the peptide identification algorithm on large scale of data.

Keywords/Search Tags:

peptide spectrum matches, overfitting problem, nonconvex optimization, self-paced learning, difference of convex programming, convex-concave procedure algorithm

PDF Full Text Request

Related items

1	Research On Non-convex Optimization And Non-convex Variational Inequality Problems And Their Algorithms
2	Optimization Algorithm And Complexity Analysis For One-side Nonconvex Saddle Point Problem
3	An Algorithm For Solving Conic Model Nonconvex Trust-region Subproblem
4	The Research Of The Generalized E-convexity And Related Optimization Questions
5	SDP Approximate Algorithm Based On D.C. Decompositions For A Class Of Nonconvex Quadratically Constrained Quadratic Programming Problems
6	Research On Algorithms Based On Convex-concave Minimax Problem
7	A Local Optimization Method With Separable Structure Nonconvex Problem
8	The Study Of Properties And Algorithm For Several Classes Of No-linear Convex Programming
9	Two Kinds Of Nonlinear Programming Problems Of Global Optimization
10	The Algorithm Research Of Some Distributed Optimization Problems With Special Structure