Font Size: a A A

High-throughput Transcriptome Sequencing Fragments Fast Alignment Algorithm

Posted on:2016-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:S LvFull Text:PDF
GTID:2180330479491062Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sequence alignment technology is an important part of gene sequence data analysis.Understanding the genetic characteristics of human organism, a thorough analysis of various diseases, prevention and against various infectious diseases are on the basis of sequence alignment technique. The next generation sequencing technology produce data faster and bigger,which lead to a rapid development of many new RNA sequence alignment tools.However,accurate alignment of high-throughput RNA-seq data is a challenging problem,and currently RNA sequence alignment tools have problems with speed and accuracy.So,developping a RNA sequence alignment tool with high efficience and accuracy is a very meaningful work.Here we present RNA-fat,a new RNA sequence alignment tool with high throughput and efficience. RNA-fat identify splicing point and structural variation through the positions of seed on reads and reference genome.We construct graph with seed,and find the optimal path to cover read. Because t he optimal path of seed graph can cover most positions of a read,the workload of mapping uncovered segments to reference genome reduce greatly. RNA-fat constrct De Bruijn graph for the reference genome,extract and sort all the unique path in the De Bruijn,and generate the index of RNA-fat. The index structure of RNA-fat is comprised of three subindex,and has a very efficient query speed. Under the impact of splicing and variation,the sequence alignment border of uncovered area may be overlapped on read or long jump on reference genome.In order to guarantee the accuracy of alignment,RNA-fat analyze every situation,compute the alignment border of uncovered area,and use different methods for sequence alignment.Through experiment,we verify the accuracy of the optimal paths,which is the foundation of the accuracy of border search of uncovered areas.By com paring the efficiency of the optimized and unoptimized dynamic programming aldorithm,we verify a significant role of two-dimensional segment tree for the performance improvement of RNA-fat. Finally,we compare the permance of RNA-fat with the current main RNA mapping tools in the same experiment environment,and we find that RNA-fat performs better on high throughput and speed,and the accuracy of RNA-fat is near the current main RNA mapping tools.
Keywords/Search Tags:De Bruijn graph, two-dimensional segment tree, sequence alignment technology, dynamic programming algorithm
PDF Full Text Request
Related items