Font Size: a A A

Protein-protein Interaction Prediction With Naive Bayes Classifier And Protein Function Annotation In Arabidopsis Thaliana

Posted on:2009-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:G LiFull Text:PDF
GTID:2120360275466754Subject:Developmental Biology
Abstract/Summary:PDF Full Text Request
Many essential cellular processes such as cellular metabolism, transport and most regulatory mechanisms rely on physical interactions between proteins. The genome function research has been the present task after the genome sequence was finished. The investigation of protein-protein interaction is of important in finding out the protein function in Arabidopsis thaliana. The well-known sequences data provides much information for bioinformatics in the post genome times. Recent years, as experimental technique, comparative genome and bioinformatics are developing rapidly, many predictive methods have come forth to the protein-protein interaction area, such as Co-Expression, Gene Context, Phylogenetic Profiles, Gene Fusion, Gene Neighbors, Ortholog, Share Biological Process and Enriched Domain Pair. However, each method has its limitation and its own biases. Therefore, integration of those methods has become the focus.The genome data and proteome data of Arabidopsis thaliana are collected containing 14987 pairs of protein interactions in 4 model organism, 3020 pairs of protein interaction of functional domains, 117090 functional composing data, 5 microarray experiments with 445 chips, 1960 protein annotations and 261 other species genome sequence data. These data are used on the prediction of protein-protein interaction with the methods above-mentioned.Here we demonstrate that Naive Bayes analysis model has been applied in the integration of the data that come from protein domain, genome wide gene expression, functional annotation and gene contexts (including phylogenetic profile, gene fusion and gene neighbor). 4666 pairs of protein interaction were attained as a gold standard positive set and 196855 pairs were attained as a gold standard negative set. The result from the integration generates nearly 22,622 protein-protein interactions in Arabidopsis thaliana. A database AtPID that hosts those interaction data and other related information is available for researchers on the internet. And these predictions are used in protein annotation in Arabidopsis thaliana under salt stress, as well as in protein function analysis of chloroplast in Arabidopsis thaliana with the combination of the subcellular location information networks.
Keywords/Search Tags:protein-protein interaction, Arabidopsis thaliana, function annotation, Naive Bayes
PDF Full Text Request
Related items