Font Size: a A A

The BWT Index Building Method For A Gene Sequence Alignment Research On Hadoop

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2308330488474768Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology.genetic data has been exploding.The basic and most important part of analyzing genetic data is gene sequence alignment,which is used to recognize homology and variability between species. Currently the sequence alignment algorithms are roughly divided into two categories,one of which is precise alignment algorithm. The other one is fuzzy matching algorithm.At present most of the gene sequence alignment algorithms are heuristic algorithms,most of which are divided into two steps:building index and sequence alignment. So both precise alignment algorithms and imprecise alignment algorithms are inseparable from the index structure.So building index is an important process of gene sequence alignment algorithm. Common index construction algorithms are roughly divided into two categories. One is based on hash table,and the other one is based on suffix tree and suffix array. However, the BWT(Burrows-Wheeler Transform) index is a relatively important index structure based on suffix array.Currently BWT index in building larger genomic sequence(such as human genome sequence) needs several hours’serial computation. In this paper, a parallel computation method based on Hadoop is proposed to build suffix array and BWT index. The algorithm uses data processing functions of MapReduce, cutting the suffix array into multiple pieces,and handle them separately. Eventually get the totally ordered suffix array and BWT index, reducing the time of building index. Experimental data show that the proposed method can save a lot of running time, achieving the expected purpose. At the same time verify the effectiveness of the algorithm.
Keywords/Search Tags:Sequence alignment, BWT, suffix array, Hadoop, MapReduce
PDF Full Text Request
Related items