Font Size: a A A

Relative Character Analysis And Burrows-Wheeler Methods For The Biological Sequence

Posted on:2012-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:L P YangFull Text:PDF
GTID:1220330368985921Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
As the coming of the post-genome period, we have to face up to the vast complete genomes and kinds of questions. The inexpensive sequence analysis tools are expected to be faster and more accurate to analyze and predict the structure and the function of the biological sequences, which can reduce the high cost of time and money by the experimental methods. In this dissertation, we focus on the field of the biological sequence analysis and propose some models with great value.Traditionally, there are two kinds of sequence analysis tools:alignment and alignment free models. However, we point out that the models fall into two categories by the topology structure of the basic comparison frames:one is character analysis and the other is relative character analysis. Models based on alignment or based on text compression are all relative character analysis models. We find that the core of the relative character models is the hypothesis of the similarity. We will find the main merit and demerit by the hypothesis of the similarity.The discussion topics of this dissertation are two kinds of relative character comparison models which are based on common strings and Burrows-Wheeler method respectively. The common string model is designed through investigating the relationship between the longest common strings and the shortest absent words. The advantages of this model are:the time complexity is linear which is perfect to analyze the huge genomes; the local distance measure derived by this model can be used to search the similar parts between the genomes, even though the local parts take some gene recombination information in; the local distance deduce the integral local distance easily which can be used to analyze the integral similarity efficiently. The validity is confirmed by classifying the subtype of the complete genomes and their segments of the HIV-1.Burrows-Wheeler methods are another kind of relative character methods. The essential foundation is the invertible Burrows-Wheeler transformation which has important applications in the field of the lossless compression. The extensive Burrows-Wheeler transformation is the key generalization for the comparison frame, which can detect the content of the common factors between the biological sequences. We propose a concept called Burrows-Wheeler similarity distribution to represent the similarity of the sequences. Moreover, some digit characteristics, expectation and entropy, are computed to compare kinds of biological sequences with different strategies chosen by the feature of the gene, protein or the structure sequences.
Keywords/Search Tags:Sequence analysis, Computational Biology, Relative character, Common string, Local Distance, Burrows-Wheeler method
PDF Full Text Request
Related items