Font Size: a A A

Analysis On The Interaction Between Post-Spliced Introns And Their Corresponding Mrna

Posted on:2014-01-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q ZhaoFull Text:PDF
GTID:1220330398996413Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
After the human genome project, Encyclopedia of DNA Elements (ENCODE) project represents a major milestone in the characterization of the human genome. The project shows a striking picture of complex molecular activity in the human genome. The interactions among genes, regulatory elements and non-coding DNA in the form of overlap control the human physiological activities. The human genome is a complex network system rather than simply composed of the isolated genes and a lot of "junk DNA". As the ENCODE project is done gradually, complex scattered regulatory sequences of the genomes, a large number of non-coding RNA genes and conservative elements of non-coding regions are found. Gerstein thought that a gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. The human genome itself is a complex network; actually the so-called "junk" DNA is very few. Protein coding genes are just one of numerous DNA elements with specific functions. ENCODE project also found that93%DNA of the human genome can be transcribed into RNA, and most transcripts are non-coding RNA which can interact with each other.Until recently, it has become increasingly clear that introns are very important vectors of biological functions. Intron gain and lose can influence many stages of mRNA metabolism, including initial transcription of a gene, editing of pre-mRNA, and nuclear export, translation and decay of the mRNA. The mutations of intron spliced regions can induce many diseases, and some base mutations in the middle regions of introns which do not affect splicing can also induce diseases. Complete match between siRNA and targeted genes can lead to targeted genes silencing, and highly but incomplete match between miRNA and targeted genes can suppress gene expression. It is very important to explore the functions of the weak interactions or weak matches between introns and mRNA. The present paper focus on the interaction rules between introns and their mRNAs/protein coding sequences, and the underlying their mechanisms are discussed. The main contributions are summarized as follows:1. The distributions of intron optimal matched frequencies between introns and their protein coding sequences in yeast, Caenorhabditis elegans and Drosophila melanogaster are analyzed. The results show that the central sequences have higher matched frequencies and the two end regions of introns have lower matched frequencies. Generally, short introns include one optimal matched region, while long introns include more than one optimal matched region. The distributions of optimal matched regions are different for first, last and middle intron groups. Compared with CDS, the base correlation(D2value) is the highest in the sequence of the former optimal matched region and the lowest in the sequence of the latter optimal matched region in long intron. The base correlations of the sequence for optimal matched region in short intron are similar to CDS. Our results indicate that the central non-conserved region of introns is a kind of organized sequence and it includes the optimal matched regions which are correlated closely with their CDS. It uncovers a possible clue on co-evolution relations between CDS and intron sequence, too.2. The distributions of protein coding sequences optimal matched frequencies between introns and their protein coding sequences in yeast, Caenorhabditis elegans and Drosophila melanogaster are obtained. The results show that there are many optimal matched regions and poorly matched regions, which are called as forbidden regions, distributed in CDS. Compared with the average RF values and CC-random RF values, the characters of the optimal matched regions and forbidden regions are clear. Two of the forbidden regions located at about10%and80%in the length of CDS are much conserved for all kinds of aligned introns. We think that the forbidden regions are the specific binding fields for some protein factors. Further work is required to prove this point.3. In protein coding genes of9model organism genomes, the distributions of mRNA optimal matched frequencies with post-spliced introns are obtained. The results show that the distributions of mRNA optimal matched frequencies are high consistency or universality. There are optimal matched frequence peaks in UTRs, obvious especially in the3’UTR. The matched frequencies are relative low in CDS regions of mRNA. The facts show that the interactions between post-spliced introns and UTRs are positively bias, especially in the3’ UTR. The distributions of the optimal matched frequencies aroud functional sites are also labored. The distributions of the optimal matched frequencies have clear change boundaries with translation initiation sites/translation termination sites. The matched frequencies aroud exon junction sites are relative low. The GC content distributions are analyzed for CDS,3’UTR,5’UTR and intron matched segments. The centers of the GC content distributions for different sequences are different. The GC content distributions of intron matched segments is unique. The centers of distributions for intron matched segments are lower than other sequences and their distributions are widespread, nearly cover other distributions. These facts show that most base bounds of the interactions between introns and mRNA are weak bounds (AT), but take account of high GC match.4. The sequence characters of the optimal matched segments for all introns are analyized. The matched rate distributions are high consistency, they are located mainly between60%and80%. The most probable value of the optimal matched segments is about20bp for low eukaryotes and30bp for high eukaryotes respectively. The results are consistent with the results of ribosomal protein genes. Some peaks in match rate distributions are conservative. It shows that the constitution of intron optimal matched segments have internal mechanisms.In short, all analyized results show that the interactions between post-spliced introns and their mRNA are fact. The distributions of mRNA optimal matched segments and the match rate distributions of intron optimal matched segments are high consistency or universality. These results show that there are abundant functional units in introns. These functional units are co-correlation structurally with all kinds of sequences for mRNA.
Keywords/Search Tags:Post-spliced intron, Protein coding genes, Local alignment, Optimalmatched segment, Match rate, Interaction
PDF Full Text Request
Related items