Font Size: a A A

Research On Identifying Methods For Plant Genome Structural Variation

Posted on:2016-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:J L ChenFull Text:PDF
GTID:2180330479490115Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genome structural variation is a variation type between SNPs and chromosome mutation。Different from single nucleotide polymorphisms(SNPs, the variation of single base level), the structural variation is a larger variation, its size is usually more than hundreds of bp, the biggest can even reach millions of bp level.Genome structural variation has a significant effect on individual phnotype and diseases.And genome structural variation widely exists in the individual’s genome, due to the complexity of biological genome structure with a large number of repeat regions in the genome and the limitations of current sequencing technology, the complexity of genome structure mutation detection is far higher than that of single nucleotide polymorphisms(SNPs).And most of the polyploid plants, increased the difficulty of the genome structural variantion detection.With the development of high-throughput sequencing technologies, a large number of species’ s high-throughput sequencing has been completed,so there are a large number of data can be used.How to utilize the high throughput sequencing technology to develop economic, rapid and accurate genome structural variation detection algorithm is a big challenge.Begin with the high- throughput sequencing technology,this paper studied three most popular high-throughput sequencing platforms and the format of the sequencing result.And in this paper,we reserched the methods for genome structural variantion detection with NGS,analysis the problems they have.For plant genome structural variantion detection, we present a algorithm based on Pair end mapping。First,the algorithm process the data.Mainly map the NGS data to the reference genome.Then the algorithm estimate the parameters,including insert length and so on.Then run the ambiguous-region finding algorithm to find regions containing more abnormal reads.In the last,run the Build-connection algorithm to print out the structural variantions.We run the PEM algorithm on simulated data and soybean data to verificate it.And we compare our algorithom with Breakdancer and Pindel on both soybean data and human genome data and analysis the result.The result shows that our algrithom is fast and accuracy in detecting genome structural variations.
Keywords/Search Tags:Genome Structural Variations, High-throughput sequencing, Pair end reads, PEM
PDF Full Text Request
Related items