Font Size: a A A

Analysis Method Of Homopolymer Sequencing On Ion Torrent

Posted on:2016-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:F F SongFull Text:PDF
GTID:2310330542973995Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Traditional sequencing technology has a revolutionary reform because of the emergence of high-throughput sequencing technology.Compared with the traditional method of sequencing,high-throughput sequencing can test hundreds of thousands or millions of DNA molecule one-time,it can detailed analysis the genome and transcriptome of a species.In addition,in some articles,high-throughput sequencing technology is also known as the next generation sequencing technology,this title shows that it has the epoch-making significance.With the development of high-throughput sequencing technology,researchers have developed many sequencing platforms,including Complete,HiSeq,ABI SOLiD System,GS FLX Platform,Ion Torrent and Proton etc..Through the application of these platforms,researchers can study on the nucleic acid level in multiple fields of animals and plants,microorganisms,and drug development etc..In these sequencing platforms,IonTorrent is the lasted generation of sequencing technology.One of its biggest advantages is the same continuous bases on the template strand will also participate in the reaction(i.e.homopolymer,such as TTT,hereafter homopolymer).If the length of Ion torrent interpretation and the number of reaction are the same,so that TTT occurred three times of chemical reaction,the response signal is 3.But the technology also exists a disadvantage.Because IonTorrent technology is to identify the signal by using the chemical reaction,and chemical reaction is susceptible to environmental influence,reactants,thus causing the signal is not accurate,resulting in wrong interpretations of base recognition stage,a large number of indel resulting in the sequence alignment process(indel insert insertion and remove deletion)and mismatch(ratio of base is not the same).How to effectively solve this problem has become a hot research field of high-throughput sequencing technology.This paper taking the original response signal of IonTorrent as the object of study,focuses on the research of these signals re interpretation and mismatch improved method of constructing model.The research includes the following aspects:First,Analysis of the distribution of voltage signal for each homopolymer case.Because Illumina sequencing platform does not appear in homopolymer,this paper proposes a new analysis method,combine Illumina and Ion torrent data to analysis voltage signal distribution.First of all,using the sequencing data of Illumina,calculate the length of the homopolymer.Considering the characteristics of chemical reagents,combine the obtained homopolymer length value with position information and base variety information ofhomopolymer,and classificate the homopolymer.Then,analysis the distribution of Ion torrent sequencing voltage signal in each case.Finally,according to the distribution of voltage signal,study distribution regularity of the voltage signal.The research results show that,homopolymer signals of each class were approximately obey the normal distribution.Then,we proposed a base polymer analysis model based on Bayes theorem which use Bayes method to calculate each signal's posteriori probability.Considering the posteriori probability,we design a improved dynamic programming algorithm calculation model for penalty factor with the goal of the minimum differences between the reference genome.The dynamic matching process of sequence is to make parameter in punish factor calculation model better.The experimental results show that: compared with traditional sequence analysis method,on the identification of a base polymer,the method in this papper improve the accuracy of 22%,effectively prove the feasibility of the proposed method.
Keywords/Search Tags:homopolymer, dynamic programming, Bayesian theory, normal distribution, penalty factor
PDF Full Text Request
Related items