Font Size: a A A

Support Vector Machines And Its Applications In Study Of Bio-Materials Function

Posted on:2004-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Z CaiFull Text:PDF
GTID:1101360122470363Subject:Materials science
Abstract/Summary:PDF Full Text Request
A number of protein functions are under exploration based on the determination of their 3D structures. The classical experimental methods for structure analysis of proteins are X-ray crystallography and multi-dimensional nuclear magnetic resonance (NMR), which are expensive and time-consuming. To some proteins, it is impossible to obtain their 3D structures for the experimental limitations and difficulties, which accordingly hinder the functional understanding of the proteins. On the other hand, the sequencing of proteins is relatively fast, simple, and inexpensive. As a result, the gap between the number of known protein sequences and the number of known three-dimensional protein structures becomes more and more large. Starting from the information about protein sequences, scientists have been engaged in predicting protein structures widely and deeply using varieties computational methods. Over 30 years of study and development, however, the prediction accuracy of protein structures still remains a range of 65% and 85%. Nowadays, it has been step into post-genome era, there are a number of hypothetical proteins needed to be studied urgently. In the past years, people mainly focused on the function prediction of a given protein or some specialized proteins, which can not meet the needs in respect of the fast development of life science. If one predict protein structure based on its sequence and then to predict/putative its function, there is no doubt that the overall accuracy will heavily drop off because of the low prediction accuracy of protein structure. From the other angel of view, a machine learning approach-Support Vector Machine (SVM) was proposed to predict protein function merely based on the information of protein primary sequence, including the physical and chemical properties of its comprised amino acids. A general 2-class classifier software--SVM★ was programmed via a stochastic gradient ascent algorithm. The performance of SVM★ was compared with the other two softwares, SVMlight (written in Sequential Minimal Optimization algorithm) and SVM-QP (written in Quadratic Programming algorithm). The comparison result shows that the classification ability of SVM★ is superior to either SVMlight or SVM-QP. Two web-based, simple and convenient, practical workstations were established. One is SVM★ which can be in common use, the other, SVMProt, can be used as a useful online protein function prediction tool. It is, for the first time, to systematically study the classification of the large number of functional protein families via SVM approach. The datasets including 69 functional protein families, such as enzymes, were collected and trained based on sequence information. By testing the independent evaluation datasets and analyzing the statistical data for the reliability index, it reveals that SVMProt is a powerful approach to identify protein function families followed by higher recognition accuracies reached a range of 80.5% and 99.7%. It also suggested by further experiments that SVMProt has broken through the bottleneck of protein classification using sequence alignment methods because of the capability of SVMProt for the classification of distantly related proteins and homologous proteins of different function. The SVMProt program is also used to identify the function of 3 SARS proteins-E protein, N protein and ORF13. E protein is predicted to be a membrane-bound protein and N protein a RNA-binding protein. Both of these predicted results are in agreement with existing knowledge. ORF13 is predicted to be a nuclear protein and a structural protein with probable DNA-binding property. So far, the function of ORF13 is still unknown. Hence, this result provided a theoretical direction for the anti-SARS drug researcher and developer. SVM★ is also introduced to recognize the formulas of the Traditional Chinese Medicine (TCM) based on the herbs Xin Wei Gui Jing. In addition, this study provides TCM doctors with some valuable formulas, that is, the false positive samples, to be fur...
Keywords/Search Tags:Support Vector Machine, Classification, Protein, Protein Function, Function Prediction, SARS Coronavirus, Modernization of Traditional Chinese Medicine
PDF Full Text Request
Related items