Font Size: a A A

Analysis On 8-mer Spectra Of Five Human DNA Sequences And Structural Units Of CpG Island Sequences

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2370330620976579Subject:Physics
Abstract/Summary:PDF Full Text Request
By studying the internal law of k-mer spectrum of DNA sequence to reveal the law of DNA sequence composition and sequence evolution,it is followed with interest by more and more domestic and foreign scholars.In this paper,we analyzed the 8-mer spectrum characteristics of human genome sequence,intergenic sequence,intron sequence,protein coding sequence and CpG island sequence.The total 8-mer was classified by XY dinucleotide classification method,and the distribution of 8-mer spectra in XY subset was studied.By analyzing the spectra of 8-mer subsets of the five DNA sequences,the correctness of the independent selection law is further verified.The intensity of independent selection is quantitatively characterized by the separation and conservation of the 8-mer spectra of three CG subsets.It is found that there is a positive correlation between the degree of separation and the degree of conservatism,so the fourth and fifth properties of the independent selection law are given,that is,selection correlation and selection homoplasy,which further improves the independent selection law.By comparing the independent selection intensity of four DNA sequences,we found that the independent selection intensity of intergenic sequence was the highest,followed by intron sequence.The independent selection intensity of protein coding sequence was significantly lower than that of intron sequence,while the independent selection intensity of CpG island sequence was the lowest.According to the sequence evolution process of intergenic sequence,intron sequence and protein coding sequence,we inferred that the independent selection intensity of DNA sequence reflects the evolution degree of DNA sequence.Based on the properties of independent selection law,we study the composition and distribution characteristics of substructure units in CpG island sequence,and give the distribution and composition of 8-mer motifs of CG1 and CG2 subsets in human CpG island sequence.It is found that there are substructure units in CpG island sequence.There are five structural units with a scale of 16-20 bp,representing five basic structural patterns respectively.We deduce the sequence structure of these five units.We also analyzed the distance distribution between structural units,and found that the most probable distance between adjacent structural units is 20 bp,most of which is between 18-40 bp.The results show that the distribution of structural units is uneven and has the aggregation property.It is also shown that the scale of structural units and the distance between adjacent structural units are independent of the G+C content of CpG island sequence.It can be inferred that the change of G+C content is reflected in the selection of functional 8-mer motifs in structural units,and also in the selection of bases in the connection sequence between adjacent structural units.These two kinds of selectivity reflect the functional diversity of CpG island sequence.We believe that the differences of DNA sequences of different types are not only reflected in the separation and conservation of 8-mer spectra of three CG subsets,but also in the use frequency of 8-mer in each CG subset.Based on this idea,we analyzed the difference distribution,preference degree and dispersion variation of 8-mer relative frequency between whole genome sequence and intergenic sequence,intron sequence,protein coding sequence,CpG island sequence respectively.The results showed that the frequency of 8-mer in CG0,CG1 and CG2 subsets was different in different DNA sequences.Compared with the whole genome sequence,CpG island sequence has the largest difference,followed by protein coding sequence.The difference between intron sequence and intergenic sequence is the smallest,but there are also obvious differences between them.Using the indexes of difference distribution,preference degree and dispersion variation,we can distinguish the composition differences of different sequences more precisely.This analysis method provides a new idea for the evolutionary characterization of genome sequences.
Keywords/Search Tags:five human DNA sequences, 8-mer spectrum, independent selection law, difference analysis, CpG island sequence, structural unit
PDF Full Text Request
Related items