Font Size: a A A

Genome-wide Study Of The Protein-rna Interactions In Escherichia Coli

Posted on:2015-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:S XuFull Text:PDF
GTID:2180330431473865Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
With the human genome project accomplished, it was discovered that nearly95%of thegenome sequence will produce thousands of non-coding RNAs, which refers to a classof RNAs that were transcribed with specific functions but can not encode proteins. Nowmost people focus on ncRNAs such as microRNA, siRNA, piRNA and lncRNA(>200nt). These ncRNAs often exert their functions through interplaying with proteins.For example, miRNA binding to a RISC enables them to repress expression of targetgene. During the proceses of studying on miRNA, RNA interference has been foundthat it is a process of siRNAs silencing of target gene expression. siRNAs are typicallyproduced from the Dicer enzyme digestion of exogenous double stranded RNA in celland they can lead to degradation of target mRNAs by interacting with AGO (Argonauteprotein). In addition, there are reports that siRNA can guide enzyme to modifychromosomal DNA. piRNAs isolated from mammalian germ cells whose length isabout30nt, and they also need interact with PIWI family members to play theirregulatory roles. By binding to specific protein, lncRNAs have the ability to controlmany aspects of gene regulation, including modification of the chromosome, regulationin the transcription and post-transcriptional level. There is also a class of smallregulatory RNAs in prokaryotes, which have important roles, including the regulation ofcarbon uptake, cell motility, biofilm formation, community sensing, bacterialpathogenicity and other physiological processes. In other organisms such as RNAviruses, which can be assembled simply from proteins and RNA into viral particles, theentire replication processing does not need DNA.Although protein-RNA interactions play an important role and prevalently exist invarious organism, due to the lag of biotechnology or methods, we can not carry outgenome-wide study of protein-RNA interactions a decade ago. Now, with the development of biotechnology, such as gene chips and high-throughput sequencing, it ispossible to do genome-wide study protein-RNA interactions. Currently, based on theiraim, these methods for studing of protein-RNA interactions can be divided into twotypes. One class is RNA-centric method which focuses on RNA-binding proteinsobtained by RNA pull-down. Another class is protein-centric approach which relies onimmunoprecipitation to extract RNAs bound by certain protein. In order to find out theexact position on RNA where the proteins bind, ultraviolet cross-linkingimmunoprecipitation (CLIP) was developed. In CLIP protocol, cells were treated with254nm ultraviolet irradiation, then lysate was treated with RNase to partially digestthese RNAs which do not bind to proteins. After co-immunoprecipitation, protein-RNAcomplexes were obtained, and treated with protease to obtained RNA fragments.Initially, these RNA obtained by CLIP were only sequenced with first-generationsequencing, which could not apply to transcriptome study and mass samples. Recently,CLIP combined with high-throughput sequencing can be applied to determine millionsof sequences and to get insight into protein-RNA interactions on transcriptome level. Inthe processing of CLIP, because UV cross-linking is irreversible, reverse transcriptasemay not read through amino acid residuals that remain covalently attached to the RNAat the cross-link site, thus generating truncated cDNA that would lost some informationof RNAs. In order to overcome this potential problem, a single nucleotide resolutionCLIP (iCLIP) and another improved approach to facilitate cross-linking, referred to asPAR-CLIP (Photoactivatable-Ribonucleoside-Enhanced CLIP) were introduced.However, these available methods only focus on the information of transcripts bound byof a certain protein each time. There are more than4,000proteins in bacteria, we wouldneed a lot of labor and financial support to carry out this CLIP methodology. For thesereasons, this study take a model organism E. coli as an example to explore genome-wideprotein-RNA interactions in vivo, combined with high-throughput sequencingtechnology. Firstly, the bacteria were treated with UV irradiation to cross-link RNA toprotein. At the same time, we set a control group without UV irradiation. Except the step of UV irradiation, there is no difference in the remaining steps between these twogroups. Then bacterial lysate was treated with high concentrations of RNase to removeRNAs unbound to protein, inactivation of the RNase followed by protease digestion,then extract the RNA fragments with phenol saturated with water, using denaturing gelto fractionate RNA fragments. The appropriate size of RNA was recovered to preparecDNA library. After polymerase chain reaction, the library were sequenced on anIllumina Hiseq2500platform and50bp single-end reads were generated. Finally,through integrating two available approaches (dCLIP and Piranha) and an in-housemethod to analysis data, preliminary information of transcripts interacting with proteinsin E.coli was acquired.In the results based on MCA analysis,2421transcripts were obtained, which include1787mRNAs,29sRNAs,33tRNAs,11rRNAs, and561intergenic regions (IGR).Using dCLIP software, we eventually identified2455transcripts containing1763mRNAs,43sRNAs,10rRNAs,35tRNAs and604IGRs. Using Piranha we got244transcripts, referring to149mRNAs,16sRNAs,9rRNAs,24tRNAs and46IGRs.After careful consideration, we chose all MCA and dCLIP methods’ results as the finalset of transcripts bound by proteins, acquired a total of3193transcripts, including2234mRNAs,47sRNAs,11rRNAs,39tRNAs and862IGRs. We investigated the results of47sRNAs, and found that9sRNAs have not been reported to interact with protein, andin which there are three sRNAs (RyfD, RyjB, SymR) overlapped among MCA, dCLIPand Piranha results. In addition, we got178new sRNA interacted with proteins bycomparison of IGRs with E. coli predicted sRNA database.In summary, we described a novel method for exploring genome-wide protein-RNAinteractions in prokaryote,which could be used to study proteins-RNA interactions inother species. Through systematic analysis of high-throughput sequencing data by twopublished bioinformatics software and MCA developed in this study,preliminaryinformation of transcripts interacting with proteins in E.coli was acquired. It was found that part of sRNAs have not been reported to interact with protein by comparison withknown data of protein-sRNA interactions. After prediction of IGRs, we obtained somecandidate sRNAs may interact with proteins. In conclusion, our method and results willprovide support for future studies of protein-RNA interactions.
Keywords/Search Tags:protein-RNA interactions, PRI, sRNA, E.coli, MCA
PDF Full Text Request
Related items