Font Size: a A A

Bacterial DNA Sequence Analysis Based On Information Entropy

Posted on:2014-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:D P WenFull Text:PDF
GTID:2260330425954134Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
From the perspective of the inheritance and variation, a cell constitute a basic independent unit of information processing, which store and transfer genetic information in DNA replication, transcription and RNA translation. According to the traditional view of genetics, all the genetic information is stored in the genetic material of organisms. The biological information processing systems control the development, growth and inheritance of organisms. To reveal the mechanism of the specific work of life, the use of information science approach to studying the genetic information storage, transmission and expression is indispensability.Information theory is a discipline which research on information measure, transfer, exchange and storage. Genetic information as a kind of information, its storage and transmission is bound to follow the general rules of storage and transmission. So we can use methods in information theory to analyze genetic information.This paper first introduces the topic background, research status and research purpose and significance. And then introduces some basic concepts about information entropy in information theory, such as the joint entropy, conditional entropy, a method of analyzing the DNA sequence based on information entropy theory was proposed. In the post-genomic era, one of the hot research fields of bioinformatics is how fast and accurate calibration the coding region and the non-coding region of the DNA sequences. A variety of methods have been proposed previously for distinguish between the coding region and non-coding region. But the most of these methods need specific DNA data sets.Its does not have universality, which information entropy method, however, make up for this deficiency.Firstly the information entropy of the genome of kinds of1947bacteria of the coding region and non-coding region was calculated, and found that both information entropy’s curve is oscillating. The information entropy of coding region is slightly larger than the information entropy of the non-coding region. Next with the latest proposed method of super information entropy, we analyzed the super information entropy of coding region and non-coding region of the bacteria and to compare the both. The comparison shows:the letter super information entropy of coding region and non-coding region is larger than the former in eukaryotic species. There is a good degree of distinction. In bacteria, however this kind of prokaryotes, the super information of the coding region and non-coding region can hardly distinguish. Meanwhile, the difference of the super information entropy of the coding region and the non-coding region has carried on the statistics, the results showed that:difference frequency distribution diagram of the super information entropy presents the Gaussian distribution, super information entropy value of the coding region slightly larger than the non-coding region, just the opposite as the result of the eukaryotic. Furthermore both discrimination of super information entropy is not great. Finally, we selected the representative6kinds of bacterial DNA sequences and analyzed linguistic features. Theoretically, if all the words in the text sorted by serial number according to the rank order from high frequency to low frequency, under the double logarithmic curve the slope is equal to-1. In eukaryotes, research shows that the non-coding region is more similar to natural languages than the coding region by the method of zipf’s law for the DNA sequences when we had carried on the statistics. In the bacteria, however, the method of linguistics zipf’s law analyzed the coding region and the non-coding region. We found that this method can hardly distinguish between the two parties. The graph of the coding region and the non-coding region are almost overlap, and the graph of the linear fitting, found that the slope of the both far less than-1, which indicates that this method of linguistic approach in bacterial DNA sequences is not applicable, in other words this method has not universality. From another perspective, prokaryotes such as bacteria don’t have the strong linguistic features than eukaryotes, which further show that the non-coding region is not the true sense of "junk DNA."The results of using information entropy in this article show that entropy can characterize a number of biological characteristics well, its application in bioinformatics needs more in-depth study.
Keywords/Search Tags:DNA sequences, information entropy, coding region and non-coding regionsuper information entropy
PDF Full Text Request
Related items