Font Size: a A A

Research On Informative SNP Selection Method Based On Greed Algorithm

Posted on:2015-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:M L ZhongFull Text:PDF
GTID:2428330491952458Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Genome-wide association study refers to the identification of single nucleotide polymorphism mutation(SNP)associated with a certain phenotype in genome-wide level.Due to the presence of linkage disequilibrium between the SNP,there is a strong correlation between the SNP loci.In order to reduce the degree of redundancy between sites and reduce the cost of genotyping,selecting the most representative information from all SNP loci is the current research focus.There are already lots of machine learning methods or combinatorial optimization algorithms for information SNP selection,but they still exist some problems,such as high computational complexity,less so on insufficient information.To address the drawbacks of current approaches,this paper proposes a framework for information SNP selection method based on multiple loci linkage disequilibrium measure,which contains two components:filter and refine.The filter phase can reduce the redundancy between loci with lower cost.And the main aim of filter is to save the computational cost of refine phase by deleting redundant loci.In the refine phase,a greedy algorithm is applied to select informative SNP from candidates.The main work is as follows:In order to overcome lacks of the conventional methods such as high computational complexity,the proposed method design a new multiple loci LD measure in the first stage.Then,we take the measure as a optimization object which is resolved by ant colony algorithm,so that these redundant loci are excluded.Compared with traditional methods,the proposed measure can not only more accurately describe the relationship between multiple sites,thus removing more redundant information,while optimization of the measure is significantly more economical than prediction optimization.During the refine phase,we use artificial neural networks as a learning model to reconstruct the genotype of non-information loci,and then to optimize the prediction accuracy by greedy algorithm.The greedy algorithm removes noise loci from candidate,so that it improves the accuracy and reduces the number of informative SNPs.In the simulation experiments,we compare the performance of our method with other two approaches on HAPMAP datasets.The performance of these methods are evaluated by three measures which are prediction accuracy,the number of informative SNPs and running time.
Keywords/Search Tags:Single Nucleotide Polymorphism, Genome-wide association study, Ant colony algorithm, artificial neural network
PDF Full Text Request
Related items