Font Size: a A A

DNA Word Sequences Alignment And Its Application

Posted on:2014-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:X H CaoFull Text:PDF
GTID:2250330422451943Subject:Computer technology
Abstract/Summary:PDF Full Text Request
DNA sequence alignment is one of the basic research method in bioinformatics.People research function, structure and evolution of species information of DNAsequence by alignment. This paper mainly studies on word-based DNA sequencealignment, and proposes three algorithms based on VSM alignment, fuzzy matchingand posets. The paper constructs a phylogenetic tree with the alignments.The word-based DNA sequence is based on machine learning methods. We takethe information entropy of boundary as a character, and use the english text to docross-learning, and then we use the obtained model to get the segmented DNA word.The purpose of this paper is to improve efficiency and precision of alignment ofpairs of DNA sequence by the word-based DNA sequences alignment.In this paper, we propose three word-based sequence alignment algorithms. Thefirst algorithm is based on vector space model alignment. According to analysis ofcharacteristics of DNA sequence, this paper selects word frequency and word lengthas features. The result of alignments can reflect insertions, substitutions and deletions of DNA sequence. The algorithm does alignment experiment on11species, andanalyzes the result of alignments. The second one is based on fuzzy matching.Focused on the situation of single-base variation of DNA sequence, the algorithmuses the context-based similarity calculation to wipe off the “synonym” of DNA,and to improve the VSM-based alignment algorithms. The third one is based onposets, the algorithm is designed for the effect of consistency of word order onfunction of DNA sequence. This paper statistics the relative position of words inwhole genome, and constructs the word-pair posets based on the order of wordappears in the same area. By comparing the two posets of DNA sequences, we gotthe sequence similarity score on order. A linear regression is used to integrate threealignment algorithms above. Experiment results shows that the proposed algorithmshave better performance than the tradional sequence alignment and can handlewhole genome sequence alignment.Finally, the alignments of word-based DNA sequence are applied to constructing the phylogenetic tree. By getting the distance matrix of11species with the word-based alignment above, we construct the phylogenetic tree with neighbor-joining method, and compares with the tree in molecular evolutionary.
Keywords/Search Tags:DNA alignment, vector space model, fuzzy matching, phylogenetic tree
PDF Full Text Request
Related items