Font Size: a A A

Research On The Method For Tag Single Nucleotide Polymorphism Selection

Posted on:2016-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2370330473464915Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In geno me-w ide,genet ic sequence variat io n caused by a single nucleot ide polymorphis m is called single nuc leot ide polymorphis ms(Single N ucleot ide Polymorphis m,SNP).Millio ns of SN Ps are present in the human ge nome.The number of SN Ps is a cha llenge in stud ying comp lex d iseases.Between different genet ic markers such as SNPs in the process of many generat ions there exist a non-random comb inat ions o f pheno mena,na me ly linkage disequilibr ium.So the researchers can select a sma ll part of SN Ps that conta ins most of t he genet ic differences infor mat ion.These SN Ps are called tag SNP.C urrent ly,there are many studies use t he characterist ic of linkage disequilibr ium(LD)between SNPs for tag SNPs selectio n.However,due to the huge number of SNPs,there are ma ny defects in these methods.For exa mp le most of these met hods can only applied to haplotype data,no cons idered the disease state infor mat ion,low predict ive accuracy for untagged SNP.For these proble m of exist ing methods,This paper ma inly st ud ies the tag SN P selection method as follow s:(1)For exist ing methods are based on pairw ise LD.Recent literat ure has shown that mult ip le-marker LD also conta ins use ful informat io n that can furt her reduce the selected tag SN P number.G ive n these technolo gical limitat ions,most sequenc ing techniques provide genot ype rather than haplot ype informat io n.To satis fy the require ment of t hese LD measures for tag SN P select ion,the haplotype phase should be inferred from t he genot ype data.Ho wever,this process is very comp licated and requires expensive cost.To overcome these diffic ult ies,this paper proposed a new LD measure called average infor matio n ga in ratio(AIGR)on the basis of infor mat ion theory.Based on t his new LD measure proposed a nove l tag SN P select ion met hod.Using hierarchica l clustering model and AIGR as a similar it y measure for SNP loc i cluster ing.Fina lly,the selected tag SN P loci are used support vector machine for prediction and evaluation.(2)SNP loci associated wit h the disease state is use ful.At present most of the tag SNP selectio n met hods only consider the linkage disequilibr ium between SNP loci,wit hout cons idering the status infor mat ion of disease,whic h is very limited.Therefore,this paper furt her considers t he status o f the d isease for tag SN P select ion.That is combing the linkage disequilibr ium between SN P loc i and the status informat io n of disease for tag SNP select ion.The main idea of t his met hod is to use sparse representat ion to co mpute the correlat io n between SN P loci and disease state,correlat ion between SN P loc i.The second is based on graph theory designed a cluster ing method for SN P loc i.This met hod is exc luded SN P loc i regardless of disease state SN P loc i and also eliminate redundant SN P loc i due to linkage disequilibr ium between SN P loc i.These SNP loc i in fina lly tag SN P set meet wit h the most rele vant o f disease state,while w it h minimum redundant SNP loci,ens ure the effectiveness of the selected tag SNP set.
Keywords/Search Tags:Single Nucleotide Polymorphism, linkage disequilibrium, tag SNP, clustering, sparse representation
PDF Full Text Request
Related items