Font Size: a A A

Research On Parallel Classification Algorithm Of Protein Sequence

Posted on:2011-11-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WangFull Text:PDF
GTID:1100330338483198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this paper, the main research object is the protein sequence classification problem in bioinformatics. We use the method of calculation, from the view point of the protein's amino acid sequences determined the three-dimensional structure, to establish mathematical model and construct an appropriate optimization algorithm to solve the protein sequence classification problem. Using the method of calculating in protein sequence classification could reduce the number of experiment which is very time-consuming and expensive, and promote the analysis of the complex biological laws and the extraction of useful biological information.Our work is based on the protein remote homologous detection algorithm using discriminative model which has the highest accuracy in the current methods and combined with the generation model based on statistical profiles by designing the statistical profile kernel function. We use the semi-supervised learning method to improve the accuracy of the remote homology detection algorithm. In protein sequence classification problem, the number of positive samples is far less than the negative samples. These resulted in an imbalance training problems of support vector machine. This paper improved support vector machine algorithm by apply a different penalty parameters on the positive sample sets and negative sample sets to achieve a balance. On a given data set, the classification results show that our algorithm has been superior to other remote homology detection algorithm.As the standard support vector machine algorithm could only determine the binary classification problem on each protein structure class, in this paper we introduce a multi-class SVM algorithm to improve availability of the protein structure prediction algorithm based on support vector machine by integrating the output of standard binary SVM classifiers and classifying the protein sequence to one decided protein structure class. Multi-class support vector machine requires complex calculations to complete the training support vector machines can be said that the cost of improving with a lot of calculation accuracy of the algorithm. To reduce the algorithm time complexity, this paper introduces parallel computing ideas, designed the parallel protein classification algorithm based on master-slave model. The algorithm is very effective in parallel mode based on a shared memory and message passing.All algorithms based on support vector machine require a certain amount of training samples for modeling, as a result of insufficient sample the support vector machine algorithm can only cover a part of protein structure classes. In this paper, we combined the high accuracy support vector machine algorithm with the full-covered pairwise sequence comparison algorithm to constitute a combination of classifier for protein structure prediction. The experiments on benchmark data sets show that the combination of classifiers achieved the full coverage for data sets and a better performance than any of the separate prediction algorithm. To improve the efficiency of combined classification, we designed a parallel protein classification algorithm based on the two-level task pool model to effectively reduce the communication waiting time and enhance the parallel performance.
Keywords/Search Tags:Protein Sequence Classification, Support Vector Machine, Classifier Combination, Parallel Computing
PDF Full Text Request
Related items