Font Size: a A A

Distribution Density Analysis Of High-throughput DNA Sequencing Data And Its Application

Posted on:2014-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:H Y BaiFull Text:PDF
GTID:2250330422951509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Next generation sequencing technology can quickly sequence the wholegenome in high coverage. Through sequencing the whole genome of individual orall mRNA of the specific organ, the relative abundance of the sequencing data in thereference genome can reflect the sample’s chromosome aneuploidy and the genes’abnormal expression. This sampling method is very easy and safe, so it has animportant significance to the developing of individual health care. However, it hashigh demands on the accurate analysis of sequencing data distribution density andits suitable application.This major work in this thesis is first to propose a reference templategeneration algorithm to generate reference template, which is more suitable fordistribution density analysis. Then a series of methods are designed to analyze thedistribution density of the sequencing data on the reference template, which aims toquantify the distribution density bias and understand the data laws better. Based onthe above analysis, an improved prenatal diagnosis system is designed to diagnosefetal chromosomal aneuploidy.First, the thesis designs an algorithm to generate a genome reference template,which is more suitable to analyze the relative abundance of the sequencing data.Because it not only ensures the reads’ number aligned to the reference template isenough, but also make the result of the alignment more accurate.Then, a series of methods are designed to analyze the sequencing data’sdistribution density. On one hand, different GC content bias models are used toanalyze the deviation of the distribution density to get a bias model with the optimalparameters, which can reflect the deviation accurately. Thereby the correction modelof the GC content bias can be designed to eliminate the bias in the distributiondensity data. On the other hand, the thesis focuses on analyzing the chromosomes’representation of the different samples. It first analyzes the different deviation of thesequencing data’s distribution on the chromosomes’ representation, then shows theaffecting of the fetal sex and the pregnant time to the chromosomes’ representation.Finally, an improved diagnosis system is designed to diagnose the fetalchromosomal aneuploidy. Through aligning the maternal plasma DNA’s sequencingdata to the reference template, the distribution density data are used to calculate thechromosomes’ representation of sequencing data and disgnose the fetalchromosomes number. In this system, before the calculation of the chromosomes’representation, the GC content bias correction model is used to eliminate the GCcontent bias on the sequencing data’s distribution density. The GC content bias correction module in the system improves the accuracy of the sample sequencingdata chromosomes’ representation.
Keywords/Search Tags:high-throughput sequencing, sequencing data’s distribution density, GC content bias, chromosomes’ representation
PDF Full Text Request
Related items