Font Size: a A A

Recursive Splicing Event Recognition Based On Transcriptomeic Sequencing Data And Its Application

Posted on:2020-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:J C WeiFull Text:PDF
GTID:2370330590472312Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Recursive splicing refers to the biological process that long introns are removed in multiple steps during pre-mRNA splicing.In comparison to large introns(> 10 kbp),most introns in higher eukaryotic genomes are removed in one step during transcription.Previous studies have revealed that recursive splicing events play important roles in many biological processes,including the pathogenesis and development of diseases.In recent years,more and more researchers have focused on recursive splicing events and found that recursive splicing occurs in Drosophila and many other vertebrates.Multiple recursive splicing sites have been predicted by different bioinformatics methods and verified by experiments.Current researches focus on the process of recursive splicing,recursive splicing site recognition and its influence on biological processes.However,there is no mature software to identify the recursive splicing sites.In this paper,we study the identification method of recursive splicng events aiming at transcriptome dates and develope a set of softwore called RSfinder to identify the recursive splicing sites.RSfinder is used to identify and analyze the recursive splicing sites of ovarian cancer tissues and paracancerous tissues.The main work of this paper is as follows:Firstly,the characteristics of recursive splicing sites are analyzed.In this thesis,8 recursive splicing sites of 7 genes in human brain tissues and 24 recursive splicing sites of 14 genes in Drosophila melanogaster were studied,and the upstream and downstream sequences of these recursive splicing sites were analyzed by multiple sequence alignment.Three types of sequence characters of the recursive splicing sites were found,and they are the conservation of splicing sites,the intron length and intron expression information of recursive splicing events were serrated.Secondly,the pipeline of RSfinder is developed to analyze the transcriptome sequencing data of the above verified recursive splicing sites.The steps are as follows: 1.The gene annotation file was used as reference and fastQC was used for quality detection.2.The transcriptome date were aligned to reference genome by TopHat,get the whole reading segment Sam file and junction file.3.Through analyzing and screeening the characteristic information of recursive splicing sites,the potential recursive splicing site that meet the above characteristics were obtained.4.Using sequence information to construct the affinity matrix of biological splicing,the recursive splicing sites were further screened.5.The visual tool RS-fig was designed to visually identify serrated structures and non-serrated structures.6.The PCC-AdaBoost algorithm was used to train the samples,and a classifier with accuracy of more than 95% was obtained,that is,to get the algorithm flow of recursive splicing site recognition that meets the requirements(RSfinder).RSfinder was used to analyze and detect the transcriptome of eight recursive splicing sites and 24 loci of Drosophila melanogaster,and the results were compared to verify the accuracy of RSfinder.The results showed that RSfinder could detect 7(87.5%)recursive splicing sites in human brain tissue and 23(95.8%)recursive splicing sites in Drospphila,which is in line with the expectations of this paper.Finally,the data of ovarian cancer were analyzed.Ovarian cancer data were divided into two groups: ovarian cancer tissue and paracancerous tissue,with three biological duplicates in each set of datasets.The obtained transcriptome data were analyzed by RSfinder to predict the recursive splicing sites and to analyze the difference of the genes in the two sets of recursive splicing sites.The results showed that there were 31 recursive splicing sites in ovarian cancer tissues existing in 25 introns of 25 genes,and 43 RS sites,which existed in 31 introns of 31 genes,only existing in normal ovarian tissues.In addition,the information of the expression of these two groups of genes was analyzed.The research work in this paper also lays a theoretical foundation for the diagnosis and treatment of ovarian cancer.
Keywords/Search Tags:Recursive Splicing, Lasso Structure, RS Site, Gene Splicing, Stepwise Splicing, Ovarian Cancer
PDF Full Text Request
Related items