Font Size: a A A

Study On Discovering Sequence Motifs From Protein-RNA Interface Regions

Posted on:2013-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiuFull Text:PDF
GTID:2210330362960704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
RNA-binding proteins(RBPs) plays essential roles in many biology processes.Effective prediction of the binding sites in the protein of RBPs provides an insight forbiologists. Lots of methods have developed to solve this problem. However, all get alow PPV(percent probability of correct prediction of RNA interact-ing residues). Tosolve this problem, we propose a method which find sequence motif on interface richregion and try different ways to test the motifs' significance.Firstly, we design a method Sim-EISMD(Simple yet Efficient InterfaceSequence Motif Discovery method). By restricting the motifs'noneffectiveoccurrence and the number of interface in motifs, we guarantee the motifs'interfaceenrichment. The motif's conservatism is ensured by the constraint that motif occursmore frequent than the average level. We totally find 12 motifs on RBPs91 which hasbeen used in interface prediction. 9 motifs' length is 3 with 3 interface in it, while theother three motifs' length is 4 with no less than 3 interfaces.To test the significance of the motifs we run Sim-EISMD on 20 random setswhich have the same distribution on both amino acid and interface. And resultsshowed that most of the patterns find on these sets appears no more than twice. Theaverage times of patterns appear more than twice is 0.45 compare to 9 in RBPs91.We also analyze the secondary structure of the motifs, and find out that 65% ofsegments that match the motifs have a secondary structure. Of these structure, 78% ofthem are helix, 32% are sheet.In this paper, we propose a new method to find the motifs lied in the interfaces ofRNA-protein interaction. This is the first time using motifs finding method onprotein-RNA interface. Result showed that there are patterns that only shows up ininterface rich sequence segments. We also compared the motifs found in RBPs91 withthe motifs found in random sets and proved that motif find by Sim-EISMD is not arandom process. Moreover, by analysing the structure of these motifs, we found outthat second structures usually formed at these motifs and these kind of structure has apreference of alpha helix rather than beta-sheet.We also spilt RBPs91 into interfacerich set and interface poor set to make it suitable for MEME and Gibbs Motif Sampler,but the results show that these method cannot find effective interface sequence motif.
Keywords/Search Tags:Protein-RNA interaction interface, motif discovery, significance test, the secondary structure of protein
PDF Full Text Request
Related items