Font Size: a A A

An Accurate Method For Population SNP Detection And Genotyping Based On Resequencing Data

Posted on:2014-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:W M HeFull Text:PDF
GTID:2250330401959139Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
With continuing innovations in sequencing technology, the next-generationsequencing technology (NGS) reduces the cost of sequencing and improvessequencing throughput, which making it possible for hundreds of samples DNAsequencing. And most of the current mode of crops and economically importantspecies genome has been sequenced, more and more researchers turn research to thepopulation re-sequencing studies. NGS based re-sequencing, which is feasible to bedone in large scales, has already been applied to detect variations, construct evolutionhistory and identify phenotype related genotypes by re-sequencing. However, hugeamount of data with no advanced method, make it computationally difficult to dealwith and hard to detect the population SNP and arrange the genotype. How toeffectively utilize and analyze the huge amount of data in NGS based re-sequencingstudies remains a thorny task for individual researcher. Here we introduce afull-featured toolkit for NGS (Illumina sequencing) based re-sequencing analysis,which can be used to deal with the raw data, interpret the mapping result, identify thevariations and annotate the variations. The main results were as follows:(1) We ultimate realized the two detection population SNP model, which namedmaximum likelihood models and Bayesian two hybrid models. We also developedcorresponding detection software GLFmuti and PopSNP based on the theory of thesetwo detection model on Linux platform. We compare them with current software andtest some results, we find our method had more effectively, and they will be widelyApplication.(2) Considering the huge amount of data produced during the populationre-sequencing studies, the toolkit is designed to use compressed data files as input oroutput to save storage space. It enables large scale re-sequencing studies with timeand computational efficiency in a user friendly manner. It offers abundant practicalfunctions and generates useful statistics during the analysis pipeline, which significantly simplified re-sequencing analysis.(3) The toolkit provides abundant functions for routine re-sequencing analysis.And these different functions in toolkit are provided in different modules thuscustomized pipeline can be easily step up. We provided the integrity the integratedsoftware and abundant sub-functions in the toolkit lay a solid foundation for specialdemand in re-sequencing projects. Users can also construct their own pipelines forother use by combining these functions.
Keywords/Search Tags:Next generation sequencing, Re-sequencing, Population SNP, Variationdetection, genotype
PDF Full Text Request
Related items