Font Size: a A A

The Research On Sequence Encoding Of Protein Classification

Posted on:2014-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:T PengFull Text:PDF
GTID:2250330425483751Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As an important branch of the research on proteomics, protein classification has attracted more and more attentions in recent years. Any new breakthrough in this research will be helpful to further understand the structure and the function of protein. In addition, it plays an important role in molecular biology, cellular biochemistry, pharmacology and medicine etc.Under normal circumstances, the research of protein classification can be divided into the following three steps:construct a reasonable dataset, design effective sequence encoding scheme as well as high-performance classification algorithm. The paper makes intensive studies on sequence encoding and designing of classification algorithms. The main work includes:1. In the light of predicting the classification of protein structure, a new sequence encoding method which involves the concept of sequence information and secondary structural features is presented. Combined with support vector machine classification algorithm, a new and practical classification model is built. This predicting model does not require the introduction of any other information, what is more, the calculation is simple and fast. In this paper, the support vector machine algorithm (jackknife test) is used to predict the classification of protein structure, by testing on four benchmark datasets, the results show that the method proposed in the paper have gained a high overall classification accuracy. After a discussion about the selection of the parameter involved in classification model as well as some comparative experiments, it verifies that the prediction model built in this paper has strong adaptive ability, generalization ability and application ability.2. In the light of predicting the protein subcellular localization, this paper launched a research on the protein sequences coding problems and the prediction model. Firstly, the article proposes a protein sequence encoded program from a different perspective, which introduces chaos game representation for sequence visualization and a new extraction method of sequence statistical information; Secondly, this paper parallelly fuses the two parts of the feature vector in the complex space. And then the vectors are used as the input of unitary distance statistical classification. Finally, the article verifies feasibility and effectiveness of the constructed prediction model on two standard data sets, and then compares the proposed method with the existing work, the experimental results show that each step (from the sequence encoding to classification forecast) of the prediction model has been effectively processed, which reflects the rationality and the effectiveness of the constructed forecast model.
Keywords/Search Tags:Protein classification, Structural classification, Subcellular localization, Sequence encoding, Support vector machine
PDF Full Text Request
Related items