At present,swarm intelligent optimization methods have made great progress and development,and have been successfully applied in the fields of scientific computing and engineering practice.However,with the rapid development of science and emergence of big data,a large number of high-dimensionally complex optimization problems with multi-modal emerge,which makes many state-of-the-art optimization methods powerlessly to solve these problems.Both Harmony Search(HS)and Differential Evolution(DE)have shown the ability for solving some complex optimization problems,they have attracted extensive attentions.HS is powerful for exploring global optimal solution and is not easy to trap into local search.In addition,HS not only can solve continuous optimization problems well,but also is very suitable to solve discrete combination optimization problems.DE has a rich and mature research base in real number optimization,and it has shown excellent performance in solving high-dimensional optimization problems.However,the performance of many existing HS and DE tends to be severely degraded in solving complex optimization problems with high-dimensionality(≥500).In the field of life sciences,high-throughput sequencing technology has produced a large number of omics datas(e.g.genomics,protein omics,metabolomics,transcriptomics,lipid omics,immunomics,glycomics,etc.)in recent years.It is very difficult to detect the disease-causing loci from high-dimensional omics data owing to huge computational burden imposed from large number of DNA sites.For example,in genome-wide association study(GWAS),researchers attempt to discover pathogenic single nucleotide polymorphisms(SNP)sites from whole genome sequences,and some single SNP sites associated with disease status have been found successfully.However,it has been widely acknowledged that high-order SNP-combination may be an important contributor to pathogenic factors which synergistically affect the disease status,and there is not a very effective method for detecting high-order disease-causing models at genome-wide scale due to an enomous number of SNP combination.Therefore,detecting high-order disease-causing models at genome-wide scale has become one of the most significant studies in the fields of bioinformatics and life science.This thesis concerns with swarm intelligent optimization algorithm and its application in the detection of high-order disease-causing SNP models.In order to quickly find global optimal solutions of complex optimization problems,this article foucs on the research of HS and DE.We try to discover the key reasons that influence the performance of HS and DE,and successfully find a common reason that the success rate of update operators of the two methods is very low in the late stage of search,and the mutation operator of DE has much redundant computation for solving high-dimensional optimization problems,which severely infulences the search speed of DE.To quickly find the high-order disease-causing SNP combination from high-dimensional data,the HS algorithm is considered for improving the search speed and enhancing the identification capability of the disease-causing models,and two HS-based algorithms(FHSA-SED and NHSA-DHSC)are proposed.The thesis mainly is in the following aspects of in-depth research and exploration:(1)In order to improve the performance on solution accuracy and convergence performance for solving high-dimensional(≥200)optimization problems with multimodal,a new harmony search(HS)algorithm called DIHS is proposed.It is based on Dynamic Dimensionality-Reduction-Adjustment-Strategy(DDRAS)and Dynamic Fret-width Strategy(DFS).The DDRAS is used for avoiding generating invalid solutions and the DFS is used to balance global exploration and local exploitation.In this work,we compare two stragies(Take-all and Take-one)and find the traditional HS has a very low success rate for generating a good solution when the dimensionality of optimization problem is very high.Theoretical analysis on the DDRAS is given and influence of related parameters on solution accuracy is investigated.Experimental results indicate that DIHS can provide significant improvement on solution accuracy with less CPU time in solving high-dimensional optimization problems with multimodal,and the more dimensions the optimization problem has,the more benefits it provides.Compared with the standard HS,when the dimension of the optimization problem is equal to 1000,DIHS improves the average accuracy of the optimal solution by 90.33%,and reduces the running time by 23.8%.(2)To improve the performance of differential evolution(DE)algorithm with small population size and low computation cost,a new DE algorithm is proposed.In the DE,dynamic Crossover Rate(CR)and local variables adjustment strategy are presented.Mutation operator,crossover operator and the local variables adjustment strategy are integrated together,and they are controlled by parameter CR,such that population diversity is achieved in the early stage of search and fine-tuning is intensified in the later local exploitation stage.To investigate the performance of the proposed algorithm,it is compared with standard DE and three state-of-the-art evolutionary algorithms(SaDE,CoDE and CMAES)on sixteen complicated benchmark functions.Wilcoxon Signed-Rank Test is employed to further compare our algorithm with five typical DE algorithms and three evolutionary algorithms.Experimental results indicate that the proposed DE algorithm is more effective in solution quality but with less CPU time than standard DE.When the proposed DE is used to solve 16 complex test problems of 1000 dimensions,it reduces the run time by 13.79%and improves the accuracy of the global optimal solution by 81.96%.(3)Two-locus model is a typical significant disease model to be identified in genome-wide association study(GWAS).Due to intensive computational burden and diversity of disease models,existing methods have drawbacks on low detection power,high computation cost,and preference for some types of disease models.A fast harmony search algorithm(FHSA-SED)is presented to detect’2-way SNP disease models.Two scoring functions(Bayesian network based K2-score and Gini-score)are used for characterizing two SNP locus as a candidate model,the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models.Harmony search algorithm(HSA)is improved for quickly finding the most likely candidate models among all two-locus models,in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect.Finally G-test statistic is used to further test the candidate models.The results of simulation experiments indicate that FHSA-SED is promising for detecting 2-way SNP disease models.(4)Genome-wide association study is especially challenging in detecting high-order disease-causing models due to model diversity,possible low or even no marginal effect of the model,and extraordinary search and computations.In this work,we propose a niche harmony search algorithm where joint entropy is utilized as a heuristic factor to guide the search for low or no marginal effect model,and two computationally lightweight scores are selected to evaluate the association between SNP combinations and disease status.In order to obtain all possible suspected pathogenic models,niche technique merges with HS,which serves as a taboo region to avoid HS trapping into local search.From the resultant set of candidate SNP-combinations,we use G-test statistic for testing true positives.Experiments were performed on twenty typical simulation datasets which include twelve marginal effect models and eight ones with no marginal effect.Our results indicate that the proposed algorithm has very high detection power for searching suspected disease models in the first stage and it is superior to some typical existing approaches in both detection power and CPU runtime for all these datasets.Application to age-related macular degeneration(AMD)demonstrates our method is promising in detecting high-order disease-causing models. |