Matching Characteristics Between Introns And MRNA Sequences In Genes With Different Expression Levels | | Posted on:2023-12-28 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:Y J Cao | Full Text:PDF | | GTID:1520306851488134 | Subject:Biophysics | | Abstract/Summary: | PDF Full Text Request | | The high-throughput sequencing of the entire genomes of various organisms has led to the accumulation of a large amount of genomic data.Analysis of these genomic data has revealed that a major portion of the genome comprises non-coding regions.Non-coding sequences play an important role in the complexity and regulation of gene expression in eukaryotes.Introns,a type of non-coding RNA(nc RNA)are initially transcribed together with the coding sequences(CDS),following the transcriptional splicing of the pre-mRNA transcript,introns are detached from the pre-mRNA to produce mature mRNA.Introns have complex and diverse biological functions.Specific interactions between introns and the corresponding mRNA sequences have been observed and have been shown to be vital for gene expression.Given this context,the thesis mainly used the improved Smith-Waterman local alignment approach to study the distribution patterns of introns on mRNA sequences and functional site regions in highly and lowly expressed genes.Further,we also investigated mechanisms of the interacting segments.The main contributions of the thesis are summarized below:(1)Introns are ubiquitous in pre-mRNA but are often overlooked.They are also known to play an important role in the regulation of gene expression.We compared the optimal matching regions between introns and mRNA sequences in Caenorhabditis elegans(C.elegans)genes with high and low expression levels.We observed that the relative matching frequency distributions of all genes lie exactly between those of the highly and lowly expressed genes,indicating that introns in highly and lowly expressed genes perform different biological functions.Highly expressed genes showed higher matching strengths between introns and mRNA sequences than genes expressed at lower levels,the remarkably matched regions appear in untranslated region(UTR),particularly in the 3’ UTR with low GC segments.There were many optimal matched and low matched regions in the coding sequences of genes with different expression levels,especially in highly expressed genes.The optimal matching frequency distributions showed marked differences around the functional regions of the translation initiation and termination sites in highly and lowly expressed genes.The mRNA sequences with Cp G islands tended to have higher relative matching frequency distributions,especially in highly expressed genes.Additionally,the sequence characteristics of the optimal matched segments were consistent with those of the mi RNAs,which are considered a type of functional RNA segments.Introns in highly and lowly expressed genes contribute to the recognition translation initiation sites and translation termination sites.(2)In eukaryotes,the assembly of the exon junction complex(EJC)occurs after splicing at specific positions upstream of exon-exon junction(EEJ)in mRNA.We compared optimal matched regions between exon-exon sequences and their corresponding introns in C.elegans genes with high and low expression levels.Notably,it was observed that long introns in lowly expressed genes are more likely to bind EJC proteins under the following three conditions: when the optimal matched segments(i)have high GC contents,(ii)are rich in CG dinucleotides,and(ii)have a high Cp G content.At the same time,the high intensity introns in lowly expressed genes are more likely to bind EJC proteins only when the optimal matched segments have a high Cp G content.However,long introns in highly expressed genes are more likely to bind other proteins when the optimal matched segments contained only one CG dinucleotide and showed medium Cp G content.Our results indicated that intron sequences,binding proteins,and coding sequences form interaction networks in genes with high and low expression levels.Thus,identifying the interactions between introns and exon-exon sequences in genes with high and low expression levels can be an effective technique for the prediction of EJC binding regions and other protein regions.(3)We discussed the optimal matched regions between introns and mRNA sequences of the highly and lowly expressed genes in the genomes of 8 model organisms.The results showed that the relative matching frequency distributions of the mRNA sequences were highly consistent,and that the matching strength in the UTR was much higher than that in the coding region.We also observed that introns of highly expressed genes showed a higher matching strength in the UTR compared with the introns of the lowly expressed genes.Further,with respect to the relative matching frequency,there were significant differences between the 3’ UTR regions of C.elegans,whereas there were significant differences between the 5’ UTR regions of Arabidopsis thaliana.With respect to the GC content,the relative matching frequency of genes with low GC segments in the UTR were much higher,whereas those of the genes with low GC segments in the CDS region were much lower.For vertebrate and plant genes with high GC segments introns of highly expressed genes showed higher relative matching frequency in the 5’ UTR compared with the introns of lowly expressed genes.Highly expressed genes exhibited a stronger interaction between the UTR and introns than lowly expressed genes;accordingly,we speculate that the convergence of introns and UTR sequences can be achieved via the interaction between them,and the convergence is more significant in highly expressed genes.Moreover,the potential matching relationships between introns and mRNA sequences in highly and lowly expressed genes were significantly different,indicating that the matching strength correlates with the ability of introns to enhance gene expression.(4)Sequence characteristics of the optimal matched segments in highly and lowly expressed genes were analyzed.The matching rates of the optimal matched segments showed a universal rule: the matching rates were mainly distributed between 55% and 90%,which reflects the conservatism of the matching rates.The most likely lengths of the optimal matched segments were longer in highly expressed genes compared with those in the lowly expressed genes,and these segments were longer in vertebrates compared with other species.The entropy results obtained for highly expressed genes imply that the adjacent bases of the optimal matched segments show a high structural organization or a highly ordered structure.For most species,the peak GC content of the optimal matched segments was higher in the highly expressed genes compared with that in the lowly expressed genes.The GC content distribution of the optimal matched segments was most extensive,showing a wider range of distribution in vertebrates compared with other species.We speculate that interactions between introns and mRNA sequences in highly and lowly expressed genes are functional RNA-RNA interactions.The sequence characteristics of the optimal matched segments were similar to those of functional RNA segments,and these sequences can be considered to behave like functional RNA sequences.The optimal matched segments in highly and lowly expressed genes are considered to perform different biological functions. | | Keywords/Search Tags: | Intron, mRNA, Highly expressed genes, Lowly expressed genes, Optimal matched segment, Exon junction complex | PDF Full Text Request | Related items |
| |
|