Font Size: a A A

Method Study On Sequencing Data Analysis Of Next-generation Semiconductor Sequencer

Posted on:2017-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:D K XueFull Text:PDF
GTID:2310330518487912Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the development of the sequencing of the second generation sequencing technology have dramatically increased sequencing thoughput, sequencing speed and emerging a variety of sequencing platform. As a new generation of sequencing technology,semiconductor sequencer do sequencing by semiconductor chip which have get rid of optical imaging device of the traditional sequencing technology to the limitation of information detection. Semiconductor sequencer has greatly improved the sequencing speed, reduced the cost of sequencing, and that makes the sequencing technology possible into clinical application.In the second generation sequencer ion semiconductor sequencers include Ion PGMTM sequencer and Ion ProtonTM sequencer. Ion PGMTM sequencer is designed for small-scale gene sequencing while Ion ProtonTM sequencer is designed for large-scale gene sequencing.The basic sequencing principle of the two sequencers is same. Though semiconductor sequencer has improved sequencing speed greatly, its sequencing accuracy is not high, about 98%. In the process of semiconductor sequencer sequencing, firstly use the semiconductor devices measuring a series of voltage signal, and then according to the value of the measured voltage signal estimating base length. There are errors in this process and it also is the main cause of semiconductor sequencer having the low sequencing accuracy.For semiconductor sequencer has detection error when it interpretating the base length based on the measured voltage value, this article has carried on the corresponding research.First of all, extracting the effectual voltage signal from the semiconductor sequencer original sequencing data and grouping of the voltage signal reasonablely according to the features of sequencing errors. Then, analysis the voltage signal distribution by statistics the voltage value of every group, realized the discriminant of bases length according the measured voltage base on bayes. As the difference of bases among the different individual of the same species are only about 1%, this paper proposes a new method which combined bayesian and reference genome information to discriminate the base length of the voltage signal. Subsequent experiments test shows that the error rate is less than 0.85% when using this method to discriminate the bases length of the measured voltage, which has fallen by 80% than the current algorithm error rate of semiconductor sequencers. The experimental results show that the proposed discriminant method in this paper is feasible.
Keywords/Search Tags:semiconductor sequencer, voltage signal, bases length, bayes, reference genome
PDF Full Text Request
Related items