Font Size: a A A

Research On Alignment-free Methods For Bioinformation Sequence Analysis

Posted on:2019-12-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:W B HouFull Text:PDF
GTID:1360330545469084Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Bioinformatics has been a hot research topic since the launch of Human Genome Project in 1990s.It focuses on managing and analyzing biological data efficiently by math,computer and network technology.With the advance of sequencing techniques,the database of bioinformatics,including DNA,RNA and proteins,has been enlarged rapidly,promoting the development of bioinformatics effectively.It has been increasingly important to develop efficient ways to obtain the information hidden in the gene data.Similarity analysis of sequences is an important task,which aims to tell the evolution relation between different species.In recent decades,the methods for analyzing the similarity of different sequences can be sorted into two main types:alignment-based method and alignment-free method.In this paper,we mainly focus on proposing new alignment-free methods to conduct the similarity analysis of biological sequences,namely DNA sequences and protein sequences.In Chapter 2,we propose two different coding ways for DNA sequences based on different codings.In the first model,the DNA sequence is mapped to a square wave with signal amplitude of 2.Four different signal durations are applied to represent four kinds of bases(A,G,T,C).The change of bases is represented by the amplitude variation.In the second model,we use the coded mark inversion(CMI)coding to translate the original DNA sequence to signal sequence.The similarity analysis is proceeded by the signal sequences.We use the phylip software to construct the phylogenetic tree of different species.By comparing with the existing model,we can find that the method proposed in this chapter is an effective method for analyzing DNA sequence similarity.In Chapter 3,we propose a novel method to represent the protein sequences appropriately.We consider the physicochemical property of amino acids as descriptors and build the 3-D model to describe the characteristic of protein sequences.By calculating the eigenvalues of the tensor of inertia,a vector is constructed to express a protein sequence.The Euclidean distance between vectors is used to measure the similarities of proteins.Compared with other methods,our model extracts less parameter to describe the similarities.The scheme is proved effective and reasonable when compared to other methods or software.The classification results in our test also match the evolution theory and virology classification.In Chapter 4,we outline a method based on Discrete Fourier Transform(DFT)and Dynamic Time Warping(DTW)to calculate the similarities of proteins.The original symbol sequences are converted to numerical sequences according to their physico-chemical properties and the similarities are calculated based on DFT and DTW.We test our scheme with different datasets and compare our results with some existing software.It is demonstrated that the consequences from our test are in agreement with evolutionary relation satisfactorily.Our method could also correct some mistakes in classification from other softwares or methods.
Keywords/Search Tags:DNA sequences, Protein sequences, Alignment-free method, Similarity analysis, Phylogenetic tree
PDF Full Text Request
Related items