Font Size: a A A

A Research On Biclustering Algorithm And Its Application In The Gene Expression Data

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2480306314455564Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of gene sampling technology,people can obtain a huge amount of gene information from various creatures at a limited cost and this data are called as the gene expression data.Generally speaking,the sampled gene expression data stored in matrix format is called the gene expression matrix.Due to the high sample dimension and low sample quantity,traditional clustering algorithms can not handle gene expression matrix well.Biclustering algorithm,a new method to analyze the gene expression data efficiently,arises under such background.By simultaneously considering the rows and columns relationship it can get more complex information inside the matrix.The researchers proposed different kinds of biclustering algorithms based on the assumption of the implicit data structure inside the gene expression matrix.However,three problems pervasively exist in the current algorithms:1:high computation complexity;2:Sensitive to noise;3:Can not explicitly take advantage of the clustering information from the last iteration.This dissertation tries to develop research in these three aspects.To address the problem 1 and 2,this dissertation comes up with the SVD preprocessing.This method utilizes the SVD's row-column information separation ability and noise suppression property.For one thing,SVD preprocessing can separate the row and column clustering information,downgrade the biclustering problem into a one-dimensional clustering problem to avoid repeat computation and so reduce the computation complexity.For the other thing,SVD low-rank reconstruction can suppress the noise inside the matrix and improve the clustering performance.The simulation results show that this preprocessing method has great compatibility and can consistently improve the clustering accuracy under various biclustering algorithms and noise conditions.Aiming at solving problem 3,this dissertation proposed the multiple sample clustering and then designed an iterative spectral clustering-based biclustering algorithm.This algorithm can explicitly takes advantage of the result from the last iteration to improve the clustering performance.What's more,because of the widespreadness of the multiple sample data structure,apart from applied in biclustering algorithm,the multiple sample clustering can also contribute to the recommendation system,producing management,server clustering building,and weather prediction,etc.Finally,we utilize the introduced algorithms into the synthetic dataset and lung cancer dataset.The simulation results support the point that the adjusted algorithm can greatly improve the clustering accuracy while reducing the computation complexity compared with the traditional algorithm.This can contribute to efficiently identifying information inside the gene expression data.
Keywords/Search Tags:Gene Expression Data, Biclustering, SVD Preprocessing, Lung Cancer Dataset
PDF Full Text Request
Related items