Font Size: a A A

Research On Detecting Methods Of Indels In Next-generation Sequencing Data Of Human Genome And Establishment Of Detecting Platform For Indels In Genome

Posted on:2016-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y RenFull Text:PDF
GTID:2180330470470778Subject:Genetics
Abstract/Summary:PDF Full Text Request
Nowadays, High-throughput sequencing technology has been increasingly applied in the research field of life science, but it is still very challenging for analyzing the sequencing data. Because the high-throughput sequencing data is made up of a large number of short sequences of 100-300 bp, people have to take a series of steps to get useful information from these short sequences. Moreover, each step involves many computational methods, which will result in a big difference of analysis results. For most laboratories, it is costly to purchase, operate and maintain the second-generation sequencing equipments. Therefore, most of them have to seek help from professional companies. But it is necessary to investigate the reliability of the results generated by the professional companies, especially under the conditions where insertions and deletions exist in the DNA sequences, which will further increase the difficulty of the analysis and cause many other false positive findings. Taking into account these factors, this research utilized two different kinds of sequencing data to investigate the performance of some widely used methods for indel analysis in human genome research, such as GATK UnifiedGenotyper, GATK HaplotypeCaller, Samtools and Varscan. One of the two types is data simulated by computer, and the other is real tumor sequencing data. We further make a comparison on the performance and the results of analysis performed by two well-known professional companies. According to our studies, this research could give researchers in the field an objective evaluation of the results of current next-generation data analysis.This study showed that there is a big difference among the results gennerated by using different methods. With the simulated sequencing data, we found that the detecting sensitivity for 1-bp insertion and deletion (indel) in human genome increases along Samtools, GATK UnifiedGenotyper, GATK HaplotypeCaller and Varscan. We also found that Varscan gave the highest sensitivity for detecting indel variations with low frequency, which was further proved by using real tumor sequencing data. To our knowledge, the process of analyzing the next-generation sequencing data is complicated, requiring a variety of knowledge about computer usage, such as using Linux operating system and developing computer programs. However most of life science researchers are not proficient in computer skills. Meanwhile it is also an extremely complicated and difficult task to configure the running environment of each method and to convert the format of input files to run all the methods smoothly. In order to solve these problems, we built a indel analysis platform named as Benefit the Mankind (Beneman). In order to ensure its efficiency and modifiability, we still use the linux operating system. Users are just required to know a little knowledge of linux operating system and to configure the path of the data file that will be analyzed by Beneman. After that, a result report will be generated automatically. At the same time, Beneman has a high modifiability, which can meet the needs of continuously being optimized and provide a framework or platform of data analysis to those who have better computer knowledge for analyzing next-generation sequencing data.In order to verify the reliability of Beneman, we analyzed the set of the raw data of real tumor by using Beneman, GATK HaplotypeCaller and a commercial company respectively. Then, we used Sanger sequencing method to resequence four of the indel mutations which were only found by Beneman. The results proved that Beneman is more reliable than the other methods. In future studies we will use Sanger sequencing to verify more indel variation points that were only found by Beneman and further improve it.
Keywords/Search Tags:High-throughput sequencing, Sanger sequencing, Insertion and deletion(Indel), Data analysis, Simulating DNA sequence
PDF Full Text Request
Related items