Font Size: a A A

Multi-Objective-Based Bi-Clustering For Gene Expression Data Analysis

Posted on:2017-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2308330488959205Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Gene expression data analysis is an important field of research in bioinformatics area. Gene expression data not only contains much information about gene activity but also reflects current physiological state of cells. Finding correlations of gene expression data can discover common functions, interactions and coordinate regulations of genes and so on.Currently, many bi-clustering algorithms have been proposed to mine associations for gene expression data. But there is still much room for improvement. Firstly, few of them attach much weight to negative associations. But negative information is very important. It contains much hidden information of genes. The biological significance of two positively related genes in gene expression data indicates that the two genes have common function under certain conditions. And the biological significance of two negatively related genes indicates that one of them shows positive effect under certain conditions while the other shows negative effect under the same conditions. Both positive and negative associations have significant implications for bioinformatics research. Secondly, there still lacks a kind of bi-clustering algorithms which can constrain the quality of solutions and mine bi-clusters satisfied with a certain size, negative associations and high relevance in area. Thirdly, with the advent of bioinformatics big-data age, requirements to the bi-clustering algorithm complexity become higher and higher. However, the optimization of current algorithms is not good enough. To solve these problems, this article presents a multi-objective-based bi-clustering algorithm (MOBA) for mining gene expression data and optimizes MOBA for multi-threading. The main research work is as follows:(1)This article proposes MOBA. The design idea is as below:Step 1:In order to eliminate data deviation, it ought to preprocess data. The specific method is to handle data with qualitative measures and divide data to three types:up-regulated, unchanged and down-regulated. Step 2:After calculating the nearest neighbor of each gene, initial seeds are built (solutions are called seeds) by merging each gene and its nearest neighbor based on the designed structure of seeds. When building initial seeds, it needs to judge whether the two genes are positively correlated or negatively correlated based on the counts of matching qualitative values in these two genes. These initial seeds constitute the initial solution set. Step 3:Compute the nearest neighbors for each seed and merge each seed with its nearest neighbor for expanding the size of bi-clusters (seeds expansion). Step 3 is repeated until no more seeds need to be expanded; Step 4:Select final solutions by computing the values of multi-objective evaluation function of seeds. The multi-objective evaluation function contains three sub-objects. The first sub-object is used to maximize the size of bi-clusters. The second is the peak-valley difference of mean squared residue. It is used to add negative associations. The third is Pearson correlation coefficient for strengthening relevance.(2) The basic framework of the structure design of MOBA is the size expansion for each solution which is called seed expansion. The framework is very suitable to be transformed into multi-threading mode. Taking advantage of the framework, MOBA is designed for multi-threading by distributing seeds to different threads to run independently. This measure reduces time complexity.The experiment testing on Yeast cell cycle data set demonstrates that MOBA works more stably and is better at condition clustering. MOBA can find high significant associations of genes with both positive and negative information.
Keywords/Search Tags:Bi-clustering, Multi-objective, Association information
PDF Full Text Request
Related items