Font Size: a A A

Initial Analysis Of Single Nucleotide Polymorphisms In Whole Genome Of Chinese Population

Posted on:2004-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y T DuFull Text:PDF
GTID:2120360092999641Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective: Single Nucleotide Polymorphisms (SNPs), are single base pair positions in genomic DNA at which different sequence alternative (alleles) exist in normal individuals in some population(s), wherein the least frequent allele has an abundance of 1% or greater. Many properties of SNPs make them powerful markers for studying complex diseases, drug susceptibility and even evolutionary history of human populations, such as the wide distribution through whole genome, comparable genetic stability and possible influence in gene expression and protein structure for those SNPs that located in coding region (cSNP). In spite of great improvements in methods for SNP discovery and the quickly increasing data deposited in the public database, dbSNP, the SNP information is still not able to provide enough knowledge for the genetic polymorphism of Chinese people, since China is a big country with great diversities in both population and geography. Here using the Chinese genomicDNA as experimental material, through the sequence analysis method of Genomic-Alignment, we construct the initial SNP map through whole genome for Chinese people, which help us get further knowledge about the distribution, classification and relationship with gene of SNPs in Chinese genome. Methods: The genomic DNA of one Han individual and equal amount genomic DNA mixture of 24 individuals from different regions and different nationalities are used respectively to construct shot-gun libraries. Random recombination clones were picked up for single direction sequencing. Because SNPs are just single base substitution in genome, accuracy of sequence needed for SNP discovery is one of the most important factors. Thus, a series of strict criteria are quite necessary. Raw data generated from sequencing machines are analyzed to transform the image files into sequence files for data analysis. After removing vector sequences, RepeatMasker was used to mask repeat sequence in reads. High quality reads are defined when continuous bases of Q20 or greater are more than 100bp, nonrepeating sequence is long than 30bp. These reads then were aligned with the public finished human genome sequence. NQS criteria were used when there was any sequence difference after sequence alignment, that is, the base quality of variation site is greater than Q20, quality of flanking 5 bases is greater than Q15, at least 9 of the 10flanking sequence should be matched perfectly. In addition, when the total number of candidate SNPs in any reads is greater than 6, the whole reads should be discarded. After sequence alignment with public human genome database, candidate SNPs were discovered and located simultaneously in chromosomes. Cut the neighboring sequence of SNPs and compare with the gene annotation database of NCBI to find their distribution among coding sequence, as well as the effects they have on amino acids coding. When SNPs discovered from Chinese populations were compared with known SNPs in the public database (dbSNP), we could get some clues about the SNPs shared among ethnically diverse sample panels both globally (all major races) and locally (Chinese).Results: 1. 19.109 SNPs and 1,214 indels are discovered after analyzing 118,285 randomly sequenced reads, among which 18,001 SNPs are located in chromosomes. Autosomes has quite similar SNP density with one another except the chromosome 17 and 12, whose low SNP densities are possible results of statistical fluctuations or methodological issues or biological function and require further investigation. Sex chromosomes have lower diversity compared with autosomes. 2. Totally 16,679 SNPs are found in annotated genome regions, 9,589 are intragenic. 274 SNPs are located in exons, among with 185 cNPs changed the coding sequence of anino acids. 3. After the comparison between SNPs wediscovered and dbSNP, 7,170(37.19%)SNPs are shared between our data and that of dbSNP. 2,544 SNPs (13.31%) are only found in Chinese population but not in dbSNP, which shows that they are Chinese specific. These Chin...
Keywords/Search Tags:high density SNP map, whole genome of Chinese, random sequencing, genomic alignment
PDF Full Text Request
Related items