Font Size: a A A

Protein Function Based On Protein Sequences And Biomedical Literature Mining

Posted on:2007-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J YuFull Text:PDF
GTID:1110360185456846Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
In the post-genome era, the prediction of protein functions by computationalmethods is one of the most demanding tasks in the study of bioinformatics. In this thesis,we applied machine learning and natural language processing approaches into theanalyses of protein sequences and biomedical literatures, the two essential carriers ofprotein functional information.In the protein sequence analysis, we adopted the amino acid composition method,the protein sequence associated physicochemical properties method, and the proteinfunctional domain composition method to depict protein sequences. We tried toinvestigate the problems of protein quaternary structure classification,DNA/RNA-binding proteins prediction, and protein functional classification byapplying the machine learning algorithms of nearest neighbor algorithm, support vectormachines and maximum likelihood estimation respectively. The satisfactory successfulrates were achieved in these works. Moreover, the results also demonstrate thatfunctional domain composition is a very effective descriptor of protein sequences. As itis well known to us that protein functional domain is a feature that closely related withprotein functions, we initially applied natural language processing method into miningthe MEDLINE abstracts for domains interaction information. Together with theinformation from other labs, we totally obtained 175 domain-domain interactions and355 domain-molecule interactions. We collected the information into the Database ofDomain Interactions and Bindings (DDIB). To provide comprehensive knowledge ofdomains,DDIB also includes relevant information from Pfam, Swiss-Prot, InterPro, GO,DIP, and KEGG databases.DDIB is freely accessible at http://www.ddib.org.
Keywords/Search Tags:protein function prediction, protein functional domain, natural language processing, support vector machines (SVMs), maximum likelihood estimation (MLE), the nearest neighbor algorithm
PDF Full Text Request
Related items