Font Size: a A A

Study On Kinase Identification Based On Supervised Learning

Posted on:2016-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2180330470457906Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Phosphorylation, catalyzed by protein kinase, is irreplaceable in regulating biological processes, and it has been described as the switch of cellular activities. Phosphorylation that phosphorylated by a certain kind of kinase may cause diseases. In this regard, identifying potential phosphorylation sites with corresponding protein kinase is beneficial in understanding the molecular mechanism and is helpful in medicine design.Originally, the research about Phosphorylation is mainly based on experimental ways, such as32P labeling method and high throughout biological technique mass spectrometry method. Those experimental methods can generate a large number of phosphorylated sites, however, most of these sites are lack of kinase information, and it is costly and time-consuming. In this regard, researchers attempt to find out the law of protein phosphorylation and predict the phosphorylation sites by using computational methods. Computational methods should rely on database of phosphorylation sites, and the data achieved by experimental methods build the foundation of computational methods. Computational methods have been the most popular way to deal with kinase identification problem.Based on the previous studies of protein phosphorylation, we proposed a kernel based kinase identification algorithm SLapRLS by taking structural risk and spatial distribution of data into consideration and applied it to solve kinase identification problem. We processed the data that are retrieved from Phospho.ELM and used traversal search to filter out the repeat data. After that, BLAST and CD-HIT were utilized to reduce the redundant data and achieve a reliable dataset. Then, we performed research on kernel function and kernel based algorithms, and proposed a new method to construct the kernel matrix based on expert knowledge. Finally, we introduced the inconsistency between labels and pairwise similarity to reflect the spatial distribution of data, and proposed SLapRLS by minimize both the inconsistency between labels and pairwise similarity and the structural risk.10fold cross validation and independent test were performed to evaluate SLapRLS and the results showed that SLapRLS can deal with kinase identification effectively.
Keywords/Search Tags:phosphorylation, kinase identification, kernel function, LapRL
PDF Full Text Request
Related items