Font Size: a A A

Research And Implementation Of SNP Imputation Algorithm In Genome-wide Association Study

Posted on:2016-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2180330479491063Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In 2005,the result of genome-wide association study was published,in the past 10 years with the rapid development of SNP squencing technology, genome-wide association study has gradually help in analysis important economic traits,plant breeding,genetically modified and complex human disease,it become one of the most important methods.At present,there are many methods can be used for SNP data detection,but due to the constraints of technology,no matter whi ch kinds of squencing method be used,the result always with the missing values, re-squencing will consume a large amount of time and cost,if we ignored the missing values the next step which is GWAS will be influenced.The scope of the application of Hidden Markov Models(HMM) are more and more extensive in bioinformatics,in comparison with other models,the characteristics of HMM is that the HMM has a flexible application,and the parameters of the HMM of often have practical significance,the model of the algorithm in this paper is a to improved HMM,it is a kind of non-homogeneous HMM,namely,the state transition probability not only depend on the last state but also depend on the specific time,so the corresponding algorithms like forward-backwrd algorithm and the viterbi algorithm were reasonable and necessary to adjust to the improved model.Through the research of HMM and the analysis to the existing imputation algorithm, this paper presents a imputation algorithm based on the HMM and it’s efficient and fast. The characteristics of the algorithm include : the algorithm need less biological information,and the running time of the algorithm is short,the algorithm is more suitable for animal and plant SNP haplotype data imputation, first the algorithm mapping the SNP imputation to the HMM,then the algorithm use the characteristics of that the linkage disequilibrium between the SNP locis can be used to infer the relationship between the SNP locis, calculate the reasonable parameters of the HMM with faster speed, then the SNP imputation problem is transformed into the HMM decoding problem, so it can be done faster, finally use differen ways to evaluate the SNP imputation algorithm.In this paper, a SNP imputation algorithm without any reference template was put forward,the algorithm rational use the limited data information and the linkage disequilibrium of SNP loci,using a sliding window to estimate the frequency of haplotype which contains the missing vlaue,choose the most possible haplotype to impute the missing vlaue,the algorithm without reference template is used to impute the SNP data which the reference template has not been built yet,and the accuracy of the algorithm is relate to the LD pattern in the SNP data.
Keywords/Search Tags:GWAS, SNP IMPUTATION, HMM
PDF Full Text Request
Related items