Font Size: a A A

High-throughput Prediction Of Cyclin-dependent Proteins Based On Sequence-derived Information

Posted on:2022-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiangFull Text:PDF
GTID:2480306749978399Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Cell cyclin-dependent proteins and related cyclin-dependent kinases play important roles in regulating cell cycle progression.Accurate understanding of the intrinsic mechanisms of cell cyclin-dependent proteins promises informative clues in the investigation of the causes of uncontrolled cell proliferation and cancer generation.Conventional identification of cyclin-dependent proteins includes biophysical and biochemical methods.However,these methods are both time-consuming and laborintensive.Particularly,it is difficult to ensure high accuracy.The need for efficient methods to perform cyclin-dependent protein prediction is becoming stronger in the post-genomic era,which poses a new demand for computational-based methods.According to the data that used to construct the prediction model,current computation-based methods can be generally categorized into two groups: structure-based approaches and sequence-based ones.The former requires the use of accurate tertiary spatial structure data of proteins;the latter only requires the use of primary sequence information of proteins.Considering the limited available protein structure data,sequencebased prediction is relatively more favored by researchers.At present,sequence-based cyclin-dependent protein prediction methods suffer from low speed and accuracy.In order to further improve the computational speed as well as prediction accuracy,we proposed a fast and accurate sequence-based method,named TYLER(predic T c Yc Lin d Ependent p Roteins)for the prediction of cyclin-dependent proteins.(1)A novel two-layer schema is proposed.In the first layer,we use an information theory-based approach to compute the enriched sequence motifs of cyclin-dependent proteins,and use the information gain ratio to quantify and select the optimal set of motifs.In the second layer,we first construct various feature spaces describing the physicochemical properties,secondary structure and evolutionary conservation of cyclindependent proteins,and then build a machine learning model.The experimental results demonstrates that,the proposed method achieves decent prediction results on both training and independent datasets,and significantly outperforms other methods in the current field.(2)High-throughput proteome-level prediction.Proteome-level prediction has strict requirements on computation time.The computational speed of TYLER has been measured to be at least 12 times faster than current methods,making it suitable for largescale computation.We adopt this method to predict potential cyclin-dependent proteins on the human proteome.The Gene Ontology analysis proves that at least some of our newly identified proteins have a high probability of being potential cyclin-dependent proteins.(3)Construction and deployment of prediction platform.The method proposed in this study was developed as an online prediction tool and the source code was shared.TYLER has provided computational services for researchers in more than ten countries or regions;the source code provides help for subsequent researchers to further explore cyclindependent proteins.
Keywords/Search Tags:Cyclin-dependent Proteins, Sequence Motifs, Machine Learning, Gene Ontology Analysis
PDF Full Text Request
Related items