Font Size: a A A

The Research On Protein Functional Class Prediction Based On Sequence Characteristics

Posted on:2011-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2120360308969508Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the implementation of Human Genome Project (HGP), nucleic acid, protein sequences and structure data growing exponentially, life sciences has substantially entered the post-genomic era. Sequence information has accumulated constantly, while the function of a large number of protein which participate the major life activity is still unknown. As the gap between the amount of protein sequence data and function information, determing protein function at the proteomic scale became one of the main tasks of biological research at post-genomic era. With the dramatic increase of sequence information, more attention has been paid to the development of methods for protein functional prediction from sequence. In this paper, we focus on the prediction of protein functional classes based on sequence characteristics. Mainly include:After summarization mathematical methods of characterization of protein sequences and pattern classification in detail, the paper presents a global encoding (GE) method to characterize protein sequences, and the nearest neighbor algorithm to predict the protein functional classes. By predicting functional classes of 1818 yeast protein sequences, we verify the validity of our method. Especially in the case of the limitation of the protein-protein interaction and only the protein sequences information is known, this method can effectively predict protein functional classes through extracting the functional information of protein sequences.Additionally, according to protein composition, physical and chemical properties, partial sequence information and moment information of amino acids, we propose a new characterization method of protein sequences—the Weighted Segmented Pseudo-amino acid composition Moment Vector (W-SPsAA-MV). The dimention of this vector is lower than that of the global encoding of protein sequences, and the forecasing results abtained by nearest neighbor classification is better than that one. Besides, as protein may be have one or more function, we use the covariance discriminant classifier to predict protein functional classes. The experimental results show that this classifier is efficient and reliable to assign functional classes to unkown proteins.
Keywords/Search Tags:Protein sequence, Prediction of protein functional classes, Global encoding method, Weighted Segmented Pseudo-ammo acid composition Moment Vector, Nearest neighbor algorithm, Covariant discriminant classifer
PDF Full Text Request
Related items