Font Size: a A A

Biclustering Analysis For Gene Expression Data

Posted on:2018-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L YinFull Text:PDF
GTID:1318330512988213Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
How to quickly mine the relevant gene information from a large-scale gene expression data and realize the accurate analysis of high-throughput gene expression data become the key problem of gene expression data analysis.Biclustering analysis of gene expression data can effectively compensate the deficiency of traditional clustering analysis in searching and identifing the local expression patterns.The dissertation takes biclustering analysis of gene expression data as starting point,aims to improve the quality verification criteria of a bicluster such as volume,coverage rate,mean square residue and to determine the biological significance of a bicluster and carries on from aspects such as single objective optimization,multi-objective optimization and ensemble learning based on Cuckoo Search for solving the problems,such as low quality evaluation criteria,insufficient diversity and unconspicuous biological meaning of a bicluster.The main works of the dissertation includes:First,Cuckoo Search Biclustering(CSB)is proposed based on Cuckoo Search(CS)algorithm.Aiming at the drawbacks of existing biclustering algorithm,such as the low rate of total coverage and high mean square residual,in the CSB the strategies for optimizing the selection of the initial bicluster is proposed to improve the diversity of solution and the random search obeying Levy distribution is adopted in order to solve the problem of prematuring of solutions.CSB can efficiently improve the scope and speed of biclustering search and steadily jump out of the local optimal solution to find various biclusters that contain different genes which avoid the problem that most genes contained in the different biclusters are the same.Compared with other biclustering algorithm such as CC,FLOC,ISA,BIC-aiNet,SEBI,SAB and SSB,it is shown that the quality and biological significance of CSB algorithm are better than those of the above algorithms.Secondly,Genetic Algorithm and Cuckoo Search hybrid Biclustering(GACSB)algorithm is proposed based on GA and CS.By applying the strategies such as tournament selection and elite preservation of GA,GACSB can extend the scope and depth of the search without increasing the computing cost,thus improving the diversity of the bicluster solutions.Compared with the commonly biclustering algorithms such as CC,FLOC,ISA,SEBI,SSB and CSB,it is shown that the GACSB algorithm has a significant improvement in the diversity and biological significance of the bicluster.In addition,by comparing the experimental results of different biclustering quality evaluation index such as ACV,MSR and VE on GACSB,we can see that the algorithm can search for different types of biclusters,which means it has a strong expansibility.Thirdly,Multi-Objective Cuckoo Search Biclustering(MOCSB)algorithm based on Multi-Objective CS is proposed.By means of transforming biclustering analysis of gene expression data into Multi-Objective problem,MOCSB introduces MOCS into the biclustering analysis for optimize the quality evaluation indexes such as the mean square residual and the volume of a bicluster.Combined with the operation of search Pareto solutions with the operation of cuckoo nests search and host abandon nests,GACSB can flexibly assemble the various quality evaluation indexes of a bicluster according to the practical needs.Compared with the main biclustering algorithms such as CC,SEBI,SMOB and CSB,it is shown that MOCSB algorithm can improve the quality and biological significance of a bicluster.Fourthly,the algorithm of Spectral Ensemble Biclustering(SEB)is proposed.For overcoming the problems in the ensemble biclustering,such as the poor quality and poor diversity of the candidate biclusters,the high computing complexity of the consensus function and the unconspicuous biological significance of the final biclusters,SEB uses different quality evaluation indexes of a bicluster to obtain various candidate biclustesr and also proposes the concensus function which is based on spectral clustering to obtain the consensus bicluster.Compared with the experimental results of common ensemble biclustering methods such as VC,BGPC,MMMC and COAC,it shows that SEB algorithm is superior to other methods in computing efficiency,quality evaluation index and biological significance.
Keywords/Search Tags:biclustering, gene expression data, cuckoo search, multi-objective optimization, ensemble learning
PDF Full Text Request
Related items