Font Size: a A A

Alignment-free Methods For DNA Sequences Comparison And Their Applications

Posted on:2019-05-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:K QiaFull Text:PDF
GTID:1310330542999546Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The rapid development of science has enabled humans to begin explor-ing the secrets of life,including human beings.In recent decades,with the implementation and completion of Human Genome Project,the biological information increase dramatically.In the face of mass data,how to manage,interpret these data and explore useful information is a meaningful work and also a challenge for experts in all kinds of fields such as biology,mathemat-ics,computer science.Under this background,bioinformatics is born as an interdisciplinary subject.The research objects of bioinformatics include nu-cleotide,protein sequences and various biological databases.Sequence com-parison is one of the most basic and core tasks in bioinformatics.Sequence comparison methods are generally divided into two categories:alignment method and alignment-free method.Because of the limitations of traditional alignment methods,alignment-free methods are increasingly popular among scholars.In this paper,DNA sequences are analyzed as the research object,and some alignment-free methods are studied for sequence comparison.The main contents include the following aspects:In Chapter 2,we briefly introduce several nonparametric tests:Spear-man statistic,Wilcoxon test and Friedman test.We use the Spearman s-tatisuic to find the optimal k of D2S and D2*for different sequence lengths by simulation strategy.Besides,the Wilcoxon test and Friedman test is usually used as tools to evaluate the quality of the method.By the real data,We show the use of non-parameter tests to evaluate the performance of several alignment methods.In Chapter 3,we propose a new weighted measure——weighted D2-type measures.The traditional D2-type measures are based on k-word counts.However,all k-words are treated equally without accounting for the potential difference in importance among all k-words.Therefore,we take advantage of maximizing deviation and give a proper weight for each k-word.Then,new weighted D2-type measures are obtained.Furthermore,we apply the newly proposed measure to similarity search and evaluation on function-ally related regulatory sequences.The results show our method performs better.In Chapter 4,fractional Fourier transform is a generalization of tra-ditional Fourier transform which has been widely used in many fields.We consider fractional Fourier transform for phylogenetic analysis.At first,DNA sequences are converted to numeric sequences,and discrete fractional Fourier transform is then used on these numeric sequences to compute power spec-tra.By extracting a new jth moment feature,distance matrix is constructed and phylogenetic tree is built.Because different orders mean different frac-tional Fourier transforms,it is an important problem to give a guideline for choosing an appropriate order.We use simulation strategy and Friedman test to solve this problem.In order to test the performance of our method,we apply our method to three real data sets.The results of phylogenetic trees demonstrate that our method is more accurate.In Chapter 5,we propose a new half-measure distance which is called weighted exponent Euclidean distance.Similar to the idea of maximizing deviation,we give an optimization model for solving weights.In order to solve this optimization model,we propose a gravitational search algorithm based on fuzzy logic.The new distance is applied to similarity search and evaluation on functionally related regulatory sequences and the results show our proposed method is reasonable and efficient.
Keywords/Search Tags:DNA sequence comparison, alignment-free methods, k-word, similarity analysis, phylogenetic analysis, maximizing deviation, fractional Fourier transform
PDF Full Text Request
Related items