Font Size: a A A

Sequence Alignment Algorithm For DNA/RNA Big Data

Posted on:2020-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:J L GaoFull Text:PDF
GTID:2438330575969078Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the next generation of sequencing technology,DNA/RNA sequence data has become the most abundant data in the subjects of genetics,bioinformatics,cell biology and system biology.DNA/RNA sequencing has become a fundamental technology in the study of life sciences.However,how to make effective use of these sequencing data is a big issue.This paper mainly studies the sequence alignment algorithms for big DNA/RNA data.In the field of sequence alignment,computer scientists have developed well-known algorithms,i.e.,the BLAST algoritlum that performs sequence search in a sequence database,or the Smith-Waterman algorithm that performs pairwise alignment.However,these standard algorithms may not be suitable for alignment tasks presented in recent sequencing data.For instance,the Smith-Waterman algorithm cannot be employed for alignment of very long sequences due to the limitation of memory;on the other hand,the Smith-Waterman algorithm also cannot demonstrate the synergy relationship between several aligned regions for a pair of sequences.To this end,this paper studies the following two problems:(1)The problem of sequence alignment for very long sequences.In this thesis,PAAVLS algorithm is proposed to solve the problem of memory shortage that often occurs in Smith-Waterman algorithm.The PAAVLS algorithm proposes the concept of matrix skeleton,reduces the information that needs to be saved,reduces memory demand,and can be applied to the alignment of very long sequences.(2)The problem of sequence alignment that contains very long gaps.In this thesis,SLGAA algorithm is proposed to study the complex relationships between a pair of sequences.The traditional Smith-Waterman algorithm rejects very long gaps.To this end,it only identifies the most similar parts among a pair of sequences.The SLGAA algorithm is compatible with long gaps and presentsseveralpairs of similar segments in onealignment result,thus revealing the complex relationship between a pair of sequences.Systematical experimentsdemonstrate that the proposed PAAVLS algorithm and SLGAA algorithm have achieved the goals proposed in algorithm design.The PAAVLS algorithm can complete the alignment of two nucleic acid sequences with more than 1000000 nucleotides,while the SLGAA algorithm can identify 3 or more pairs of similar fragments by one alignment.The PAAVLS algorithm and SLGAA algorithm should have a positive impact on the research of DNA/RNAsequence data.
Keywords/Search Tags:DNA/RNA sequences, pairwise alignment, very long sequence, very long gap, memory reduction
PDF Full Text Request
Related items