Font Size: a A A

Composition And Evolution Analysis Of Protein-coding Genes In Different Types Of Prokaryotic Genomes

Posted on:2016-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:S N BiFull Text:PDF
GTID:2180330470950478Subject:Microbiology
Abstract/Summary:PDF Full Text Request
Prokaryote is a class of lower organism that consists of single cell or multiple cells withoutreal nucleus. Comparing with eukaryote, the prokaryotic genomes are much smaller.Generally speaking, there is only a single DNA molecule in the genome. Because of the highproportion of the protein coding sequences, the protein-coding genes become importantmaterial for studies of prokaryote genomes. With the development of sequencing technology,the genomic data growing exponentially, some studies have found that the genomes withsimilar GC content have more common features, but the genomes with quite different GCcontent exhibit different characteristics of protein-coding genes. At the same time, manyprokaryotic genomes were found to have multiple chromosomes and some contain more thanone plasmid. However, at present, there are few studies on the evolutionary relations amongthe large, small chromosomes and plasmids from the perspective of the genomes ofprokaryotes, and the results are differences. This paper makes an analysis for the compositioncharacteristics of the protein-coding genes among the large, small chromosomes and plasmidsin prokaryote genomes with different GC content from the aspects of the genome, found thatrelative to the plasmids, the large and small chromosomes have more similar compositioncharacteristics, which provide a new idea for the future research. Dissertation work includes:I. Based on RefSeq database, a dataset composed of54prokaryotic genomes with differentGC content is constructed. And all of the genomes contain at least two chromosomes and aplasmid. The length distributions statistic of protein-coding genes in the large, smallchromosomes and plasmids show that the most widely distributed length of protein-codinggenes in the large and small chromosomes is500-999bp, followed by the length of1-499bp of the genes and the length of1000-1499bp of the genes, and further analysis indicatedthat the length of some genes in the small chromosomes with low GC content is concentratedin1~499bp; In contrast, there are much differences in the length distribution ofprotein-coding genes of plasmids. Thus, compared with the plasmids, the large and smallchromosomes are more similar in length distribution. An GC content distribution statistic ofprotein-coding genes was taken in54genomes found that the genes of large and smallchromosomes in most genomes have more similar GC content distribution than plasmids, forthe genomes with multi-plasmid, the distribution of GC content in some plasmids are difference while some are similar. Compared the GC content of each component in thegenome with the GC content of the genomes show that the GC content of the genes in thelarge and small chromosomes are smaller with the GC content of genomes, and Some of theGC content of genes in plasmids are similar and some are quite different, and even theplasmids in the same species also show different genetic characteristics.II. In order to analysis the evolution characteristics of protein-coding genes in large andsmall chromosomes and plasmids, the paper made a codon bias analysis for theprotein-coding genes of the large and small chromosomes and plasmids in54genomes, andthe RSCU showed that the large chromosomes and the small chromosomes had more commoncodon bias. Further analysis for each component in the initiation codon and termination codonusage found that the large, small chromosomes and plasmids had a significant preference forAUG, the usage of UAA and UGA changed with the genomic GC content significantly.However, the differences were that the frequency of termination codon UGA in plasmids wasslightly lower than the large and small chromosomes, and the frequency of termination codonUAG was slightly higher than the large and small chromosomes in the genomes with high GCcontent. The correlation analysis of CAI and axis1for protein-coding genes showed that theproportion of significant correlation of each component in the genomes of54species were68.52%,73.44%and61.06%, respectively, and the correlation analysis of GC3s and axis1showed that the proportion of significant correlation of each component in the genomes of54species were83.33%,79.69%and91.15%, respectively. And the results indicated that themain factors to shape the codon usage pattern were the gene expression level and GC3s. Inthe large and small chromosomes and plasmids of54genomes, the proportions ofsignificantly correlated for the GC3s and CAI were18.52%,34.38%and36.28%, respectively,which indicated that there is a link with the expression level and the base composition ofgenes in a certain percentage of genomes, both of them affected the codon usage patterns.Thus, the results of codon usage analysis showed that the codon usage pattern was affected bya major factor and there were also many other factors affected the codon usage patterns, theimpacts of these factors on the codon usage pattern of different components of were differentin most species, the base composition and gene expression level were the main factorsaffecting the codon usage, but in the plasmids the base composition showed a strongerinfluence.
Keywords/Search Tags:Prokaryotic genome, Large chromosome, Small chromosome, Plasmid, Protein-coding gene
PDF Full Text Request
Related items