Font Size: a A A

Predicting Large-scale Protein-protein Interactions Based On Sparse Representation Based Classifier Model

Posted on:2017-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y A HuangFull Text:PDF
GTID:2310330503981937Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein is an important big biological molecule, which participates in the constitution of all kind of organism. In fact, proteins always carry out the biological functions by pairs. Therefore, constructing protein-protein interaction(PPI) network has become one of the hottest topics in the field of bioinformatics. Detecting new PPIs and collecting the corresponding data contribute to the understanding on the mechanism of protein interaction and therefore boost the development of researches on disease mechanism as well as the new drug development. There is an urgent need to develop a computational model which can predict PPIs in a large scale only based on protein sequence.Feature extraction and sample classification are two main steps for computational model for predicting protein-protein interactions. As the first part of prediction model, original protein sequence can be transformed in to feature vectors of the same dimension.Most of existing machine learning classifiers, such as support vector machine and neural network, need some manual intervention to achieve the best performance, which leads to a lot of labor and time to adjust the corresponding parameters. Therefore, how to build a fast and accurate computational model which need little manual intervention and consider additional biological information is an urgent problem to deal with.Considering the aforementioned disadvantages of previously proposed computational model for predicting protein-protein interactions, the following parts are achieved in this thesis.First, this article proposed three different kinds of feature extraction methods to consider other kind of biological information. These feature extraction method include discrete cosine transformation(DCT), wavelet transform(WT) and global encoding(GE). The results indicate that the feature extraction methods for protein sequence proposed in the work have outstanding representation ability.Secondly, in this work, weighted sparse representation based classifier(WSRC) was chosen as the machine learning classification model. Since the feature extraction methods proposed in this work are usually applied in the field of image classification or adopt the similar concept, weighted sparse representation based classifier was finally chosen. In this work, we compare the performance of WSRC with state-of-the-art machine learning classifier, support vector machine. The comparison results indicate that the proposed feature extraction descriptors can deal with WSRC very well and that the whole computational model has outstanding prediction performance.Finally, I use two kinds of ensemble learning methods to combine these three feature extraction methods proposed in this work, where one is mainly based on voting and the other is based on predicting possibility. The proposed model was applied to three PPI datasets and the comparison results were further analyzed in this work. Through the comparison with other previously proposed computational models, we can come to a conclusion that the proposed prediction model has outstanding prediction performance and can be applied to predicting protein-protein interactions in a large scale.
Keywords/Search Tags:Protein-protein interaction prediction, Weighted sparse representation based classifier, protein sequences, ensemble learning model
PDF Full Text Request
Related items