Font Size: a A A

Research On Hybri Intelligent Algorithm For Tag SNPs Selection

Posted on:2015-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:L X WangFull Text:PDF
GTID:2370330491455940Subject:Biological Information Science and Technology
Abstract/Summary:PDF Full Text Request
SNPs(single nucleotide polymorphisms)provide a new method for studying the disease risk among human individuals and differences in drug response as well as multi-gene genetic diseases.Theoretically,all SNPs should be genotyped for searching variation regions.Traditional methods are inefficiency and expensive.Studies show that tag SNPs carries most of the genetic information of SINPs data set,which makes significance of searching tag SNPs,However,identifying tag SNPs from SNPs data set costs a huge amount of eomputation,while machine learning methods provide effective ways to solve the problem.As searching tag SNPs is intrinsically combinatorial optimization problem,some researches successfully solve such problem with set covering method in case of a small data set Unfortunately it turns difficult to obtain optimal solutions in case of complicated set cover problems.Since ant colony algorithms(ACO)work well on searching near-optimal solution,two new algorithms are proposed in searching tag SNPs,both of which combine set covering with ACO,one based on penalty function(PCACO)and the other with random-perturbation(RCACO).While the stander ACO is easy to fall into local optimum,and takes a long time in global searching,the thesis makes following improvements:(1)The proposed PC ACO algorithm improves global search ability by limiting the range of pheromone concentration to avoid excessive concentration or dispersing.(2)Introduce penalty function with penalty factors fitting the Gaussian curve during the iteration process,which avoids the algorithm being trapped into local optimum and improves the convergence rate on searching tag SNPs.(3)An ACO algorithm based on random-perturbation(RCACO)is proposed successfully for tag SNPs searching.The main improvements of the algorithm lie in following two aspects:First,random selection strategy and perturbation strategy are designed dedicatedly for tag SNPs searching questions;second,perturbation factor in according with the inverted exponential curve is proposed.Comparing with thePCACO,RCACO further improves the precision of tag SNPs searching.(4)Given the linkage disequilibrium among the SNPs loci,the search space is reduced by k-means data clustering on the data set with high dimension and small sample,thus the efficiency of correlation analysis on massive SNPs data is promoted.The above work conducts an in-depth research on obtaining tag SNPs based on ACO algorithm as far as possible and proposes two different strategies on applying ACO to the set cover method as well as achieving a data clustering for data set with high-dimension and small sample.Experimental results on simulated data sets show that the proposed algorithms achieve higher accuracy with less time consumption then PSO?GA algorithms in recent years.
Keywords/Search Tags:tag SNPs, haplotype, linkage disequilibrium, set cover, ant colony algorithm, clustering
PDF Full Text Request
Related items