With the implementation of Human Genome Project (HGP), nucleic acid, protein sequences and structure data growing exponentially, life sciences has substantially entered the post-genomic era. Sequence information has accumulated constantly, while the function of a large number of protein which participate the major life activity is still unknown. As the gap between the amount of protein sequence data and function information, determing protein function at the proteomic scale became one of the main tasks of biological research at post-genomic era. With the dramatic increase of sequence information, more attention has been paid to the development of methods for protein functional prediction from sequence. In this paper, we focus on the prediction of protein functional classes based on sequence characteristics. Mainly include:After summarization mathematical methods of characterization of protein sequences and pattern classification in detail, the paper presents a global encoding (GE) method to characterize protein sequences, and the nearest neighbor algorithm to predict the protein functional classes. By predicting functional classes of 1818 yeast protein sequences, we verify the validity of our method. Especially in the case of the limitation of the protein-protein interaction and only the protein sequences information is known, this method can effectively predict protein functional classes through extracting the functional information of protein sequences.Additionally, according to protein composition, physical and chemical properties, partial sequence information and moment information of amino acids, we propose a new characterization method of protein sequences—the Weighted Segmented Pseudo-amino acid composition Moment Vector (W-SPsAA-MV). The dimention of this vector is lower than that of the global encoding of protein sequences, and the forecasing results abtained by nearest neighbor classification is better than that one. Besides, as protein may be have one or more function, we use the covariance discriminant classifier to predict protein functional classes. The experimental results show that this classifier is efficient and reliable to assign functional classes to unkown proteins.
|