Font Size: a A A

The Integrated NGS-based Strategy For The Detection Of Genomic Deletions

Posted on:2016-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:R GuanFull Text:PDF
GTID:2180330473962640Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, next-generation sequencing technologies have made great progress. Sequencing-based methods for discovering structural variation keep emerging. Nevertheless, constrained by low even sometimes no sequence coverage, high sequencing errors, short reads generated using next-generation sequencing technologies and the like, single use of these methods still cannot make good calls. This dissertation revolved around one of the most important types of structural variation, deletion, studied the detection methods using multiple sets of next-generation sequencing data and analyzed into the results obtained. The main content includes:(1) To effectively evaluate the performance of the integrated detection strategy proposed, studied the deletion-detection environment, and detailed the methods for building simulation platform which would generate reads as close to the real data as possible, including generation of the deletion standard set, simulation of the diploid individual genome and paired-end reads.(2) Proposed an integrated strategy for discovering deletions, which organically combines three mainstream detection theories. The strategy is divided into two stages. The first step, for the purpose of maximizing detection sensitivity, does initial mapping and following split read alignment of paired-end reads to identify a comprehensive set of lbp-resolution deletion candidates. The second step, aiming at minimizing false discovery rate, extracts deletion-related features by making full use of the twice alignment results according to the fundamentals of read-depth methods, split-read approaches and read-pair technologies, and then uses a discriminative model with high generalization ability to distinguish true from false deletion candidates based on the features. The experimental results show that compared with the traditional split-read approaches, the integrated strategy not only detects deletions down to single-base-pair level but also effectively reduces the number of false positives with negligible loss of sensitivity.(3) Studied the applicability of support vector machine and random forest to the proposed strategy, and further analyzed the contributions of each feature to a deletion prediction. The experimental results show that by taking a machine learning model with well generalization performance as the discriminative model of the proposed strategy, it can effectively reduce false positive calls with negligible loss of sensitivity. Moreover, the study of feature importance finds that the influence of each feature on the model prediction accuracy changes with read depth of coverage.
Keywords/Search Tags:next-generation sequencing (NGS), deletion, feature extraction, machine learning algorithms
PDF Full Text Request
Related items