Font Size: a A A

Research On Genome Characteristics Based On Information Entropy Theory

Posted on:2012-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhanFull Text:PDF
GTID:2210330362950463Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
From the perspective of genetic information processing, a cell constitute a basic independent unit of information processing, which store and transfer genetic information in DNA replication, transcription and RNA translation. According to the traditional view of genetics, all the genetic information is stored in the genetic material of organisms. The biological information processing systems control the development, growth and inheritance of organisms. To reveal the mechanism of the specific work of life, the use of information science approach to studying the genetic information storage, transmission and expression is indispensability.Information theory is a discipline which research on information measure, transfer, exchange and storage. Genetic information as a kind of information, its storage and transmission is bound to follow the general rules of storage and transmission. So we can use methods in information theory to analyze genetic information.The one contribution of this paper is that: for analysing the distribution of bases in genome translation initiation and termination region, such as conservative, period, a model based on redundancy entropy was proposed. The bases distribution of DNA sequence near the start, stop codon was calculated, the entropy and redundancy of each site was calculated, redundancy curve was drawn. The conservative of each site was analysed, the coding region and non-coding region were compared to find their different.The information entropy analysis of the aera near start code, stop codon showed: in prokaryotes, the coding region showed a very strong period-3 feature, genetic sequences closer, its information redundancy curve together with each other, the information redundancy of the sites in SD region are relatively large. The results of the analysis in eukaryotes are relatively poor and needs further research.Another contribution of this paper is that:for analysing genome sequence similarity, this paper proposes a similarity metric model based on information entropy. The value of the average mutual information between two aligned sequences divid their joint entropy was used as a measure for their similarity. We use it to constructe a similarity matrix of 11 pecies, and analysis their similarity, the result fit with the biological taxonomy in a certain extent. The phylogenetic tree constructed by distance matrix, also reflects the evolutionary relationship between them, it indicate that the model designing is reasonable.The results of the experiment using information entropy in this article show that entropy can characterize a number of biological characteristics well, its application in bioinformatics needs more in-depth study.
Keywords/Search Tags:DNA, information entropy, start codon, evolutionary tree
PDF Full Text Request
Related items