Font Size: a A A

A Comparison And Evaluation Of Five Biclustering Algorithms For Gene Expression Data

Posted on:2013-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2230330374467811Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In recent years, with the development of high throughput technologies, gene microarraydata increases exponentially. Advanced analysis tools are required to extract information fromthe huge amount of data. Traditional clustering technique is an important technique to clustergenes according to their expression profiles in extracting knowledge from microarray data.But they have some shortcomings:1) traditional clustering techniques work well for smalldata sets but perform poorly when the number of experimental conditions is large since thesemethods cluster the genes based on their expression under all conditions;2) clustersgenerated by these algorithms can not overlap, i.e. a gene belongs to at most one cluster,whereas in fact the gene may participate in different activation patterns for differentconditions. To move beyond these limits, a modified clustering concept called biclusteringhas been suggested in several studies.Several biclustering algorithms have been proposed to identify biclusters, in whichgenes share similar expression patterns across a number of conditions. However, differentalgorithms would yield different biclusters and further lead to distinct conclusions. Therefore,some testing and comparisons between these algorithms are strongly required.In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC andSAMBA) were compared with each other in the cases where they were used to handle twoexpression datasets (GDS1620and pathway) with different dimensions in Arabidopsisthaliana (A. thaliana).GO (gene ontology) annotation and PPI (protein-protein interaction) network were usedto verify the corresponding biological significance of biclusters from the five algorithms. Tocompare the algorithms’ performance and evaluate quality of identified biclusters, twoscoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposedin our study. For each dataset, after combining the scores of all biclusters into one unifiedranking, we could evaluate the performance and behavior of the five biclustering algorithmsin a better way.Both WE and PPI scoring methods were proved effective to validate biologicalsignificance of the biclusters, and a significant positive correlation between the two sets ofscores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that:(1) ISA is the mosteffective one among the five algorithms on the dataset of GDS1620and BIMAX outperformsthe other algorithms on the dataset of pathway.(2) Both ISA and BIMAX are data-dependent.The former one does not work well on the datasets with few genes, while the latter one holdswell for the datasets with more conditions.(3) FABIA and QUBIC perform poorly in thisstudy and they may be suitable to large datasets with more genes and more conditions.(4)SAMBA is also data-independent as it performs well on two given datasets. The comparisonresults provide useful information for researchers to choose a suitable algorithm for eachgiven dataset.
Keywords/Search Tags:gene expression data analysis, biclustering, GO annotation, protein-proteininteraction network
PDF Full Text Request
Related items