Font Size: a A A

Development Of High-throughput Genotyping Methods Based On DNA Microarray And New-generation Sequencing Technologies

Posted on:2011-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:W B XieFull Text:PDF
GTID:1100360308985931Subject:Genomics
Abstract/Summary:PDF Full Text Request
As the staple of more than one third of the world's population, rice is one of the most important crops worldwide.With the smallest genome size in crops, rice is also an important model species of Gramineae (Poaceae) even from an academic perspective. Such features can strongly facilitate gene cloning and study of gene function in rice. However, the traditional molecular markers,although have been widely used in genotyping assays of populations,are laborious and time-consuming, and have limitations in throughput to generate high density genetic maps. These imperfections hampered the progress of gene cloning.Based on high-throughput technologies,i.e. microarray technology and new-generation sequencing technology, we developed high-throughput genotyping methods for constructing ultrahigh-density genetic maps.Although the studies are carried out in rice, these methods are generally applicable in genetic map construction of other species.The main results are as follows:1.A method based on median polish was adopted to detect sequence polymorphisms between genotypes using microarray data which were originally generated for expression profiling using oligonucleotide microarrays.Using this method, we identified 6,655 sequence polymorphisms,referred to as single feature polymorphisms (SFPs),between two rice varieties.We showed that the median polish method has the advantage of avoiding fitting complex linear models thus can be used to analyze complex transcriptome datasets and is suitable to discover markers and genotype populations simultaneously in eQTL project. The method is also superior in sensitivity, accuracy and computing time requirement compared with two previously used methods.A comparison with data from a resequencing project indicated that 75.6% of the SFPs had SNP supports in the probe regions.Further comparison revealed that SNPs in sequences immediately flanking the probes also have contributions to the detection of SFPs in cases where the probes and the targets have perfectly matched sequences.It was shown that differences in minimum free energies caused by flanking SNPs,which may change the stability of RNA secondary structure, may partly explain the SFPs as detected. Furthermore, we found that the ability of SFP to measure genotypes from mRNA could be employed to detect allele-specific expression (ASE) in hybrids in a high-throughput manner.955 ASE events were detected based on 718 SFPs, indicating that ASE is a dynamic regulatory process.2. A parent-independent method was developed to construct ultrahigh-density linkage maps based on low-coverage sequence of recombinant inbred lines, providing capacity to genotype a mapping population in a single Solexa sequencing run. First, all potential SNPs were identified to obtain drafts of parental genotypes using a maximum parsimonious inference of recombination (MPR),making maximum use of SNP information found in the entire population. Second, high-quality SNPs were identified by filtering-out low-quality ones by permutations involving resampling of windows of SNPs followed by Bayesian inference, removing 94.1% of the low-quality SNPs while retaining 98.7% of the high-quality SNPs.Parental genotypes of high-quality SNPs were inferred with 100% accuracy. Third, a hidden Markov model was constructed to transform the low-quality SNP data into high-quality genotype bins.The model treats the adjacent SNP sites as points in a Markov chain and assigns a probability to an event with reference to the neighboring sites, thus taking into account the sequencing error rate, the genotypes of adjacent SNPs and the physical distances between SNPs.With 0.05x genome sequence per line, an ultrahigh-density linkage map composed of bins of high-quality SNPs using 238 recombinant inbred lines derived from a cross between two rice varieties was constructed. Using this map, a QTL for grain width (GW5) was localized to its presumed genomic region in a bin of 200-kb,confirming the accuracy and quality of the map.The high density and high quality SNPs identified can facilitate further fine mapping of target genes.We also carried out a large number of Monte Carlo simulations for evaluating various factors influencing the accuracy of MPR algorithm. These evaluations have significant implications for the potential applicability of the method, including applications in other species and populations other than RILs and even F2 populations.An R package was constructed and released, which would be useful for constructing ultrahigh-density linkage maps using new-generation sequencing technologies without the availability of high quality genotype data of the parental lines.Furthermore, we constructed a database called CREP (Collection of rice expression profiles,http://crep.ncpgr.cn),which is used to store the huge microarray data generated by our laboratory in recent years and handle data queries from end users.
Keywords/Search Tags:Oryza sativa, genome, genetic map, genotyping, expression profiles, DNA microarray, new-generation sequencing technology, database
PDF Full Text Request
Related items