Font Size: a A A

Feature Selection Algorithm And Its Application To SNP Association Analysis

Posted on:2015-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y X GaoFull Text:PDF
GTID:2180330464968784Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Single nucleotide polymorphisms(SNP) refers to variation of the DNA sequence caused by the change of a single nucleotide. As a result, SNP causes the diversity of species’ chromosomal genome. Association analysis which often existed in recommendation algorithms is aimed to analysis of the correlation between random variables, the intrinsic relations of between the variations of two random variables. While the association analysis of SNP means to find out the features subset which has the largest association with one specified disease from one person’s whole-genome features set.This paper mainly studies the search algorithms in SNP association analysis and makes comparison between the features selected by these algorithms and real features in disease prediction. We also proposes a mixed feature selection algorithm named MFS. The most common feature selection algorithms mainly consist of four modules with emphasis on search strategy and evaluation criteria modules in the domestic and overseas. According to the difference of search strategy, these algorithms can be divided into exhaustive, sequential and random algorithms. Based on different evaluation criteria, they also can be divided into two types named filtering and wrapper algorithms.In this paper, we first implement the SFFS-based and the DOS-based feature selection algorithms. They are different in search strategies. We introduce their principles, design ideas and characteristics in detail and make experiments on data sets from Model1 to Model3. Through analysis of the experimental result and comparison of characteristics and applied ranges of these two algorithms, we conclude that within the tolerance of average computing time DOS-based algorithm is more suitable than SFFS for SNP aassociation analysis problem. And we ultimately determine the use of DOS search strategy in MFS algorithm. Second, we introduce the necessity of hybrid feature selection algorithm and how to realize it. We divide the MFS algorithm into two important stages according to the different criteria and usage scenarios of Filter and Wrapper: stage of dimension reduction of sample data with Filter criterion and stage of core search with Wrapper criterion. The MFS algorithm makes use of these two criteria to get balance in both computing time and performance. Finally, due to the disadvantage of the DOS-based algorithm that it always traps into local minimum, we use Halton sequence in MFS algorithm to deal with this situation. Because the Halton sequence haslow discrepancy and is super uniform distributed. These two characters ensure the search is uniform and the algorithm won’t focus its search on the near of one point. Thus the MFS algorithm avoids trapping into local minimum.We perform experiences on both simulated and real dataset. The results show that MFS algorithm makes better performance in SNP association analysis than other regular feature selection algorithms. It consumes less time and obtains more accurate results. The algorithms has higher accurate of classification with the obtained feature set.
Keywords/Search Tags:SNP, Correlation Analysis, Filter, Wrapper, Mixed Feature Selection
PDF Full Text Request
Related items