Font Size: a A A

Information Divergence And Alignment Space

Posted on:2010-10-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:G X LuFull Text:PDF
GTID:1118360302457473Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
With the development of information science, people have recognized that the information theory has been not only used in mathematical theory of communication and secrecy systems, but also used in other fields, such as intelligent computing, biology, financial modeling. In all the fields above, random distributions is a very significant concept. People need " information " to identify the distinction between two different distributions. Therefore, information theory can play an important role in these areas. Besides, the concept of error has promoted. At previous time, the error only refers to the substitutions of symbols. Recently, the deletions and insertions of symbols are also considered as errors yet. We call all these errors generalized errors. Facing those problems mentioned, we mainly study three aspects below in this paper.1. The information divergence is studied. Information divergence is a way to measure the difference between two random distributions. For example, relative entropy is the most well-known information divergence. In chapter 2, we introduce some famous information divergences and discuss the relationship of them and metrics in probability distribution space. We improve the chief conclusion in the paper " A New Metric for Probability Distributions " and obtain a class of new metrics for probability distributions. The new metrics based on the important concept of relative entropy and Jensen - Shannon divergence. The sufficient and necessary conditions for the new metrics to hold are provided next. The maxima and minima of the new metrics are given, too. At last, the convexity of the Jensen - Shannon divergence is discussed.2. The Alignment space over F_q is studied. Alignment space is a nonlinear space defined by generalized errors. In chapter 3-5, the detailed introductions of the related concepts of Alignment space over F_q are presented. Firstly, we give the definitions of the Alignment space over F_q and the Alignment distance in it. We illustrate that the Alignment distance can be computed by classical dynamic programming algorithm. The properties and relations between Levenshtein distance by expansion sequences and Alignment distance are discussed. In order to research the Alignment space, we introduce the modulus structure theory of sequences and virtual symbol operation theory. That there must be a virtual symbol operation between two alignment sequences is proved strictly in this part. The sufficient and necessary conditions for isometric operator and micro-adapted operator are also proved. At last, we study the structure of the Alignment space, give the number of the sequence pairs whose Alignment distance is n in the n - dimensional Alignment subspace is 2n and whose Alignment distance is 2 in the n - dimensional Alignment subspace is (2n~2—7n + 11)—6, obtained the sufficient and necessary conditions for sequence pairs whose Alignment distance is n in the n - dimensional Alignment subspace and the result that the maximum score alignment sequences are the minimum penalty alignment sequences in this case.3. The Alignment space generated by general metric space is studied. The Alignment space mentioned in part 2 can be extended to general condition. Firstly, the definitions of the Alignment space generated by general metric space and the Alignment distance in it are described. And then, we prove that the Alignment distance is equivalent to a class of Levenshtein distances. Using modulus structure theory, the Alignment distance is proved to be a metric in metric space. At last, we give an example on the protein structure sequence alignment in 3-dimensional space as the application of Alignment space.
Keywords/Search Tags:Information divergence, metric, Alignment space, Modulus structure theory and virtual symbol operation, Enumeration about minimum penalty alignment sequences
PDF Full Text Request
Related items