Font Size: a A A

Research On Nonconvex Classification-based Algorithm For Peptide Identification

Posted on:2019-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2370330620964855Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The task of identifying the correct matches from a number of peptide spectrum matches(PSMs)presented in post-database searching is called Peptide identification.It is a key procedure for peptide spectrum matches in protein identification.Although many approaches have been developed to improve the accuracy of peptide identification,It is an important research topic to design efficient algorithm for peptide identification due to lots of the peptide spectrum matches(PSMs)output by the searching engine are not correct.A kernel-based classification method CRanker has shown its effectiveness and efficiency in terms of the number of identified PSMs.However,it has two weaknesses: overfitting and instability on small-sized datasets.In this paper,a modified CRanker method and efficient algorithm for peptide spectrum matches are proposed to tackle weaknesses.Like a standard SVM classifier,CRanker uses a single loss function and weight parameters on all PSM data samples.Most of target PSM labels are not correct,this is an important reason of overfitting problem on small-sized datasets.Chapter 2 modified CRanker method by employ different weight parameters to decoy and target PSMs respectively and analyzing the function of model parameters.Instability on small-sized datasets comes from the non-convex optimization formulation where existing optimization problem solvers easily terminate at bad local optimum,especially on small datasets.Chapter 3 propose a new method,called Self-Paced Learning CRanker(SPLCRanker).We replace the standard training in CRanker with self-paced learning(SPL),which starts training from comparatively confident PSMs,and then iteratively adds more targets that are complex during training process by increasing the SPL parameter.The SPL-CRanker is more stable and outperforms peptide identification algorithms in terms of the number of correct PSMs and ROC under common FDR level.Chapter 4 convert the modified CRanker into difference of convex programming and solve it by the convex-concave procedure algorithm.Experimental studies show the new method outperforms benchmark post-database search algorithms in terms of the number of correct PSMs and has good generalization performance.The works in this chapter lay a theoretical foundation for designing the peptide identification algorithm on large scale of data.
Keywords/Search Tags:peptide spectrum matches, overfitting problem, nonconvex optimization, self-paced learning, difference of convex programming, convex-concave procedure algorithm
PDF Full Text Request
Related items