Font Size: a A A

Differential Expression Analysis And Exon Skipping Event Identification Based On RNA-seq Data

Posted on:2019-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZhangFull Text:PDF
GTID:2428330563458779Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of High-throughput sequencing technology,RNA-seq hasgradually become the main method of transcriptome data analysis,it is able to provide researchers with more comprehensive information of transcriptome,thus allows researchers to have a detailed studies of the transcriptome.In our study,the RNA-seq data of cancer patients are studied in the following aspects:(1)In order to preprocess the raw data,we provide a data preprocessing process that can batch process sequencing data.It's consists of quality control and sequence alignment.In quality control,we use Fast QC and Trimmomatic to evaluate and filter the raw data,the purpose is to get high quality data for subsequent analysis.In sequence alignment,we use STAR to align the reads to the reference genome in order to restore the lost position information during the sequencing process.(2)Using the clean data obtained by the pretreatment to perform differential expression analysis.Firstly,we use RSEM to calculate the raw count data of each sample and then generate an expression matrix by merging and removing all 0 rows.Secondly,we use DEseq2 to identify genes that are differential expressed between cancerous and normal tissues.Finally,using Fisher's exact test for enrichment analysis to understand the functional information of these genes,and then use these information to demonstrate the relationship between differential expressed genes and cancer phenotypes.(3)Using the PSI-based method to identify the exon skipping event.The key step of this method is the prediction of PSI value.To achieve this goal,we propose a prediction algorithm based on ensemble learning.In this algorithm,we use a multilayer feedforward neural network as the base learner and use the Adaboost.R2 regression algorithm to predict PSI value directly from RNA-seq data.In this study,RNA-seq data from 25 cancer patients are divided into 5 comparison groups for differential expression analysis,and finally 229,211,153,132,and 170 differential expressed genes are obtained.At the same time,using enrichment analysis to obtain GO terms and Pathways that are significantly enriched in these differential expressed genes,which provided the basis for subsequent study on cancer's driven genes,and the differential expression analysis process proposed in our study has certain advantages in operationalefficiency,so it also has certain reference significance for the study of other genetic diseases.For the exon skipping event identification model proposed in our study,we use the public dataset which is generated from the RNA-seq data of mouse tissues to evaluate the algorithm performance,the result shows that the prediction accuracy of this model is significantly better than the previous model.
Keywords/Search Tags:RNA-seq, Difference Expression, Exon Skipping, Integrated learning
PDF Full Text Request
Related items