Font Size: a A A

Evolutionary Computation Based Maximum Similarity Biclustering And Application

Posted on:2014-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:X J PengFull Text:PDF
GTID:2268330425483703Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gene expression data produced by gene chip experiments has a huge scale, whichtypically contains thousands of genes and hundreds of samples. Thus, the geneexpression data has characteristics of the high dimensions and large data volume.Simultaneously, because of the complexity of the individual organisms, geneexpression level may have great difference, may also be highly similar, which aredisorderly dispersed. These data hides behind the great information, so it needs tomine the gene expression data for discovering this hidden information. Biclustering isa good analysis tool for gene expression data. Comparing to the traditional clustering,biclustering can dig out much similarity and biologically meaningful information. Soin this paper, some works of biclustering for gene expression dada have been done.The main works have the following points:Firstly, this paper studies about the types, the structures of the biclusters and thesearch strategies of biclustering algorithms, analyzes the characteristics ofmainstream biclustering algorithms, explores the evolutionary computation basedbiclustering algorithm model, and illustrates some proposal for improvement.Secondly, the main work of this paper is to propose a evolutionary computationbased maximum similarity biclustering for gene expression data. The algorithm firstuses feature selection algorithm to select some columns as reference conditions fromgene expression data, then convertes the data matrix based on reference conditions,followed gets the similar matrix according to the reference genes, and finally uses theevolution algorithm, initializing the population according to the binary encode rules,to iterate until the evolution finished and obtains a best individual. Some bestindividuals meeting some certain conditions are decoded into a biclustering, and thensaving them in the results. The final output by the algorithm is a set of biclusters.Finally, some contrast experiments on some expression data have been done totest the performance of the algorithm. The first kind of data is synthetic data sets. Thesecond is two gene expression data sets of yeast. The third is the gene expression dataof cancer. The paper gives the biclusters from these gene expression data a score assome rules, comparing the results. It shows that the algorithm of this paperoutperform some other algorithms. In addition, the experiment result of the third datashows that this algorithm can do a good job on cancer classification.
Keywords/Search Tags:Gene expression data, Evolutionary computation, Biclustering, Maximum similarity bicluster, Similarity matrix
PDF Full Text Request
Related items