Font Size: a A A

Predicting Subcellular Localization Of Mycobacterial Proteins

Posted on:2017-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:P P ZhuFull Text:PDF
GTID:2180330485486496Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Mycobacterium is a kind of slender, slightly curved bacilli which is named for branch growth and belongs to the actinomycetaceae. It has the properties of anti-acidand not easily colored, thus, mycobacteriais also called acid-fast bacillus. Mycobacterium tuberculosis and mycobacterium bovis are all pathogenic to mammals. Their high incidences make that tuberculosis, eprosy and other infectious diseases become serious global public health problems which threaten the human health. Mycobacterium has unique cell wall and specific virulence factors. Especially, the secretory proteins play a key role in the conduction of signals between cells. The prediction of subcellular location and secretory proteins in mycobacterium may provide vital clues for the study of protein function as well as for drug target discovery. Therefore, it is very important to establish a highly accurate and robust model to predict mycobacterial subcellular location and secretory proteins.We initially focused on predicting mycobacterial subcellular location. Firstly, we built a reliable, strict protein benchmark dataset which contains 272 mycobacterial protein sequences with sequence identities of less than 25%. Furthermore, tripeptides were selected to represent mycobacterial protein samples. Third, in order to eliminate the disaster brought by high dimensional features, the binominal distribution was used to find an optimal feature subset. And a total of 219 tripeptides were achieved, which can produce the maximum accuracy. Finally, the support vector machine was applied for constructing a prediction model with strong robustness, high stability and high accuracy.Moreover, we identified mycobacterial secretory proteins using computational method. Firstly, we established an appropriate benchmark dataset which includes 35 mycobacterial secretory proteins and 266 mycobacterial non-secretory proteins. Secondly, pseudo amino acid composition was used to formulate mycobacterial secretory proteins. Thirdly, the variance analysis was utilized to calculate the F value of each feature. Then on the basis of the F values, the features can be sorted and screened. As a result, the optimal feature subset consists of 374 features. Finally, the support vector machine was used to predict mycobacterial secretory proteins.The jackknife cross-validation results showed that the method proposed in this thesis got a maximum overall accuracy of 89.71% and the average accuracy of 81.12% in predicting mycobacterial subcellular location. An online web service called Mycosub was established to predict mycobacterial subcellular location. It can be freely accessible at http://lin.uestc.edu.cn/server/MycoSub. When identifying mycobacterial secretory proteins, we obtained an overall accuracy of 81.73% with the AUC value of 0.93 in jackknife cross-validation. Comparison with the previous methods demonstrated that our method in this paper is robust, effective, stable and accurate. We anticipate that the two models will be useful for studying the functions of mycobacterial proteins and developing anti-mycobacterium drugs.
Keywords/Search Tags:mycobacterial subcellular location, optimal tripeptides, binomial distribution, variance analysis, support vector machine
PDF Full Text Request
Related items