Font Size: a A A

Research On Identification Algorithm Of Genetic Regulatory Elements Based On High-throughout Sequencing Data

Posted on:2015-11-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J ZhangFull Text:PDF
GTID:1220330422492404Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technique, genomicsequencing data is fastly growing with unprecedented scale. Faceing with theemergence of large data, biologists are often limited to initial and finite analysis dueto restriction of storage and processing capacity, which lead to the loss of novel andvaluable information. Bioinfomatics analysis based on the high-throughputsequencing data are more and more dependent on the help of computer method andmathematical model. Gene expression is a complex biological process, and playimportant roles in development, differential and susceptivity of malignant disease.Dissection of transcriptional regulation mechanism including genetic, epigenetic andenvironmental will help for understanding the resource of phenotype difference anddisease phenotype. With the help of computer method and mathematical model, thisstudy focused on high-throughput RNA sequencing data and identified thetranscriptional element, including genetic loci and genetic-epigenetic regulatorypattern that affected coding gene, bidirectional promoter and non-coding genethrough integrating genetic variants and epigenetic modification.Firstly, we predicted inter-and intra-allele specific expression gene usingmaximum likelihood model in human whole genome scale and quantified the degreeof allele specific expression. The identification of allele-specific gene expression isessential for the mapping of genetic variants that affect gene regulation and diseaserisk. Although RNA sequencing offers the opportunity to measure expression atallele levels, the availability of powerful statistical methods for mapping allelespecific expression in single or multiple individuals is limited. Considering allelicexpression heterogeneity of multiple single nucleptide polymorphisms (SNPs)within a gene, we proposed maximum likelihood model using beta-binomialdistrbution to predict allele specific expression gene. Simulation data includingdifferent coverage levels, allelic expression levels, SNPs fraction and random noiseshowed high accuracy and robustness of prediction model. Results of real data fromsingle human individual was similar, and approximately17%of genes displayed allele-specific effect on gene expression in a single individual. In addition, moregenes showed allele specific expression in multiple individuals because of thedifference of inter-individuals.Secondly, we identified genetic regulatory loci of bidirectional gene pairs usinggenome-wide association analysis based on high-throughput RNA sequencing data,and proposed that there are two regulation mechanisms in bidirectional promoter.There are a large number of gene pairs whose distance of between transcription startsites less than1000bp and transcripted in the opposite direction, termed asbidirectional gene pairs. Generally, the bidirectional gene pairs may have similarexpression patterns because of sharing the same promoter region which was calledbidirectional promoter. Although the correlations of bidirectional gene pairexpressions are higher than random gene pairs expression, they significant less thangene pairs that shared common regulatory loci. Thus there are diffences inexpression correlation of bidirection gene pairs. We analysed the cause of differentcorrelation of bidirection gene pairs. The genome-wide association analysis basedon high-throughput RNA sequencing data showed that genetic loci located withinbidirectional promoter are not only associated with expression of bidirectional genepair, but also associated with the correlation of bidirectional gene pair expression.So two regulatory mechanisms of bidirection promoter were proposed.Thirdly, we quantified the primary transcript of miRNA using humanpopulation high-throughput RNA sequencing data, and identified novel genetic lociaffected miRNA (pri-miQTLs) through genome-wide association analysis. Due to tothe similar structure between miRNA and coding gene, we thought RNA sequencingdata not only provides the information of coding gene but also provide theinformation of miRNA, that will help for the understanding the transcriptionalregulatory mechanism of non-coding RNA. The believability of estimated miRNAexpression by RNA-seq was evaluated through expression of random sequence andpre-miRNA, as well as distribution of genome region. We found multiple geneticloci through genome-wide association and multiple test correction. Moreover,although pri-miQTLs located within the transcriptional regulatory region sharedwith host gene, they showed no correlation with host gene. It is suggested that the pri-miQTLs we found were novel and unique regultors of miRNA.Fourthly, we focused on the relationship between SNPs, DNA methylation andgene expression, and predicted in human genome wide using maximum likelihoodestimate through proposing four gene expression regulation patterns. Multiplecomplex SNPs-DNA methylation regulation patterns were finally found. Usingsimulated data with different correlation coefficients between any two traits, theprediction power of regulation pattern based on maximum likelihood model showeda satisfying performance and relative stability. Results of regulation patterndistribution in Europe and Africa populations were similar. SNPs and DNAmethylation had approximately the same effect on expression regulation of one halfgenes, which was termed cooperative/antagonistic regulation pattern. Less than onethirds genes are controlled only by one of the factors. Multiple novel genetic lociassociated with gene expression were identified by cooperative/antagonisticregulation pattern, and specifically enriched in E-box enhancer and transcriptionalregulatory function such as RNA elongate.
Keywords/Search Tags:RNA sequencing data, Regulatory element, DNA methylation, Singlenucleotide polymorphisms, Maximum likelihood model
PDF Full Text Request
Related items