Font Size: a A A

Statistical Analysis Of Several DNA Sequences In Human Genome

Posted on:2009-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:N Z JinFull Text:PDF
GTID:2120360245481426Subject:Physical chemistry
Abstract/Summary:PDF Full Text Request
With the accomplishment of Human Genome Project,life science is coming into post genome era.It is opportunities and challenges for theoretical biologist that how to mining valuable information from massive data and to resolve the mystery of life.The statistical analysis of DNA sequence is of importance for understanding the structure and function of genomes.Several statistic methods have been proposed to study the genetic information stored in DNA sequence.In this study, combine with biological characteristics,we use information theory to analyze the base correlations in human Y chromosome palindromes and use Zipf's approach in linguistics to analyze the statistical features of frequency and correlation of 16 nearest neighboring nucleotides in 12 human chromosomes.The main contents are as follows:1.Introduce the basic knowledge of molecular biology and the Human Genome Project.2.Introduce Regression Analysis and the basic knowledge of information theory.3.In chapter 3,on the basis of information theory and statistical methods,we use mutual information,n-tuple entropy and conditional entropy,to analyze the base correlations in human Y chromosome palindromes;we find that the long range correlation and short range correlation in them and the origin of these signals are the presence of interspersed repeat sequences.4.In chapter 4,the Zipf's approach in linguistics is utilized to analyze the statistical features of frequency and correlation of 16 nearest neighboring nucleotides(AA,AC,AG,...,TT)in 12 human chromosomes (Y,22,21,20,19,18,17,16,15,14,13,and 12).We find the statistical features of nearest neighboring nucleotides in human genome(â…°) frequency distribution follows the linear function,and(â…±)correlation distribution follows the inverse function.The coefficients of linear function and inverse function depend on GC content.This work proposes the correlation distribution of nearest neighboring nucleotides for the first time and extends the descriptor about nearest neighboring nucleotides.
Keywords/Search Tags:information theory, mutual information, palindromes, base correlation, Zipf's law, nearest neighboring nucleotide, frequency distribution, correlation distribution
PDF Full Text Request
Related items