Font Size: a A A

Application Of Biclustering Integrated MOPSO And Greedy Randomized Construction To Gene Expression Data

Posted on:2013-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:L N LaiFull Text:PDF
GTID:2218330371982759Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the advent of High-throughput microarray technology, it generated large amountsof gene expression datasets. Microarray technology allows simultaneously to measure theexpression level of thousands of genes under different experimental conditions. Themicroarray datasets are generally presented as a n mdata matrix, where rows correspondto genes and columns correspond to experimental conditions. Mining local patterns whichshow similar expression patterns under subsets of conditions from those microarray datasets isan important research problem in bioinformatics and medical domain.Clustering technology has been widely used in gene expression profiles anaysis, and itcan identify sets of genes which has similar expression patterns under all expressionconditions. But in the cellular processes, it is knowledge that subsets of genes wereco-regulated and co-expression only under subsets of condtions. But clustering can notidentify such subsets of gene. Biclustering algorithms which were originally described byHartigan can perform simultaneous clustering of both rows and columns of the geneexpression matrix. Afterwards, biclustering was introduced to gene expression analysis byCheng and Church. It is a very useful technology to identify local patterns in which sets ofgenes shares similar expression under subsets of conditions but all of conditions. So far, manybiclustering algorithms have been developed for gene expression data analysis, such asbiclustering, SAMBA, Gibbs sampling biclustering, spectral biclustering, and so on.Most of these methods use local search to generate optimal biclusters. In recent years, manyevolution algorithms (EA) have been proposed to indentify optimal bicluster in microarraydata in order to escape from local minima. And, through improving the evolution algorithms,EA is able to solve multi-objective optimization problem. So, more and more researchers areinterested in evolution algorithms based heutistics optimization methods.When searching optimal biclusters from gene expression data, we need to optimizesimultaneously several objectives, which are in conflict with each other. These severalobjectives are the size, mean squared score and row variance of the bicluster. In this situation,we can appy multi-objective evolutionary algorithms (MOEAs) to discover efficiently optimalbiclusters. MOEAs mainly proposed three relaxed forms of Pareto dominance, such as ε-dominance, α-dominanceb b-dominance. Applying these relaxed forms of Paretodomincace can regulate convergence of MOEA, and improve diversity of all particles in theexternal archive. Amonge these forms, is widely applied in the MOEAs, dueto its effectiveness and its theoretical foundation. Particle swarm optimization (PSO) firstlyintruduced by Kebnnedy and Eberhart is inspired by the behavior of a bird flock finding food.PSO has been found to be quite efficient in solving the single objective optimization problem.Because of rapid convergence and efficiency, researchers are interested in applying PSO tohandle multi-objective optimization problem. This methods is named as multi-objectiveparticle swarm optimization (MOPSO).In this paper, we propose a new inporoved algorithm, GR-MOPSO algorithm. Thismethod incorporate, crowding distance and time varying parametersapproach into MOPSO biclustering framework. Firstly, we apply greedy randomizedconstruction to generate bicluster seeds, then use these bicluster seeds to initialize all particlesin the particle swarm. Secondly, PSO has three vital parameters which are inertia w andacceleration coefficientsc1,c2. In order to improve the convergence of PSO, we introducetime varying parameters into MOPSO. These parameters are changing with the iterations,which can enhance global exploration at early stage and raise local exploitation at later stage.Lastly, with the purpose of controlling the size of external archive and maintaining the gooddiversity of particles in the archive, we appy the crowding distance to truncate archivepopulation. We respectively implement this new improved aglorithm GR-MOPSO on theyeast and human B-cells datasets, and compare the results with MOEA, MOPSOB. Theresults show: the quality of biclusters GR-MOPSO found are better than the results of MOEAand MOPSOB. And GR-MOPSO have well population diversity and rapid convergence.The innovation of this paper is mainly the following two aspects:①Through analyzingthe mining bicluster, we found the features of bicluster conflicting with each other.So,we canapply multi-objective optimization theory into the biclustering. Incorporated,crowding distance and time varying parameters approach into MOPSO framework,GR-MOPSO can search more optimal biclusters when application of the improved MOPSO tothe bicluster seeds.And,it can maintain diversity of particles in the particle swarm andenhance rapid convergence.②In the initial stage of multi-objective particle swarmoptimization, we use bicluster seeds to initialize the position and velocity of all particles in theswarm, not use random value. The bicluster seeds are the results of Greedy RandomizedConstruction. So, it can enable all particles to have the character of bicluster and enhance efficiency to search optimal solutions.The future work: Firstly, in the biclustering of largely gene expression data,thresholddetermines the number and quatity of found biclusters. So, we need to study how to selectsuitable threshold in order to improve quatity of biclusters and run efficiently.Secondly, due toPSO without the filting process such as crossover and mutation, we need to improve theparticle swarm for enhancing the diversity of the particles in the swarm and increasing rate ofconvergence. Lastly, we further rearch the biological significance of the biclustering results.
Keywords/Search Tags:Gene expression data, Biclustering, Multi-objective particle swarm optimization, Greedy Randomized Construction
PDF Full Text Request
Related items