Font Size: a A A

Predicting Protein-Protein Interaction Of Drosophila Melanogaster Using Naive Bayes Classifier

Posted on:2008-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C H LiFull Text:PDF
GTID:2120360218450163Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is a subject which uses computer technology to collect, integrate and analyze data produced in molecular biology research. Data mining(sometimes called Knowledge Discovery in Databases) is the process of analyzing data from different perspectives and summarizing it into useful information, which is a necessary tool for bioinformatics research. Protein-protein interaction plays important roles in life activity. In this work, we applied Naive Bayes classifier to predict interaction odds among two randomly selected proteins of Drosophila Melanogaster.Several methods have been used to predict the interaction between proteins. Usually only one method was used in most work, however, there exists bias among different methods. In this thesis, we choose Ortholog, Co-Expression, Share Biological Process and Enriched Domain Pair as the attributes for Naive Bayes classifier after collecting large amount of original data. The estimation for each attribute has its own algorithm and we have implemented them. Then, we calculate the class-conditional odds and prior odds for the Protein-protein interaction of Drosophila Melanogaster by using GSP(Gold Standard Positive) interaction and GSN(Gold Standard Negative) interaction. After that, we use Naive Bayes classifier to calculate the interaction odds given new predictive evidence. At last, we finish the analysis according to the research data.Java language was used in this work as the language tool to process data, and the result was stored in the MySql database. The data we produced will be a guideline for future biological experiments as well as help us annotate the function of unknown protein of Drosophila Melanogaster. The methodology we used will also help to predict protein interaction for other Species.
Keywords/Search Tags:Bioinformatics, Data mining, Naive Bayes classifier, Protein-protein interaction
PDF Full Text Request
Related items