Font Size: a A A

Analysis Of Homogeneous Region In Swine Genome And Its Novel Gene Prediction

Posted on:2011-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ZhangFull Text:PDF
GTID:2120360305474368Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
With the development of high-throughout sequencing technology, more and more genomes have been successfully sequenced. As pig (Sus scrofa) is an important economic animal and medical research model, the sequencing project of swine genome is certain to have been received much attention from the researchers all over the world. Currently, the latest swine genome assembly is available in the Ensembl database. As a result, interpretation of the biological information in swine genome sequences becomes one of the most urgent tasks for biologists.Previous investigations have demonstrated that there are some long segments with relatively constant GC contents distributing in the eukaryotic genomes. These segments are called isochores, which could provide not only important structural features of the genome, but also some biological features of it. In this paper, some compositional features of the swine genome are decoded by analyzing its structure of non-overlapping LHGRs (Long Homogeneious Genome Regions); Distributions of some important functional regions in the swine genome are deciphered by discussing the distributional correlations between LHGRs and some biological elements, such as genes, repeats and CpG islands, as well as by predicting genes. The major conclusions we have drawn are as follows:1. By compositionally segmenting the swine genome, we get a total of 2491 LHGRs in this study. After evaluating strictly three aspects of all LHGRs, including GC homogeneity (h value), GC differences (ΔGC) between adjacent LHGRs and length, 23 isochores are filtered out, most of which locate in chromosome 16, and the remaining 2468 LHGRs are isochore-like regions.2. By evaluating the relative amount, length and average GC contents of each LHGR family in swine genome and human genome, we come to the conclusion that the two species are similar in LHGR patterns, which are close to the general isochore patterns of warm-blooded mammals.3. By comparing the GC content of LHGR with the average GC content of host chromosome where LHGR locates, we found that the primary LHGR type in the swine genome is AT type. In terms of the GC content of whole genome, 53.19% of all LHGRs in the swine genome are GC rich LHGRs, while 54.27% are GC poor LHGRs. However, these are not the cases in the human genome where 54.27% of 2568 LHGRs belong to GC poor LHGRs.4. The GC content homogeneities of 19 swine chromosomes are different from one another. By means of the cumulative GC profiles, we could observe intuitively the homogeneities in chromosomes, as each chromosome could be described by a curve with different fluctuations. There are some long and approximately straight regions—isochores—in these curves, which represent for the long regions with homogeneous GC content. Moreover, the number of LHGRs in each chromosome also reflects the compositional homogeneity.5. Like isochores, LHGRs are also correlated with LINE, genes and CpG islands: LINE density decreases with the increase of the GC content of LHGR; both of gene density and CpG island density increase along with the appearance of GC type LHGR; gene density in the LHGR with particular GC content range of 50-51% or 54-55% is obviously higher than the counterpart in other LHGR. These distributional correlations could be used to enhance the prediction efficiency of the elements mentioned above.6. The efficiency of gene prediction could be improved by ab initio prediction combined with EST information. The novel gene candidates need to be further verified by experiments.
Keywords/Search Tags:isochore, LHGR, gene prediction, repeats prediction, CpG island prediction
PDF Full Text Request
Related items