A Research On Biclustering Algorithm And Its Application In The Gene Expression Data

Posted on:2022-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:X Wang

Full Text:PDF

GTID:2480306314455564

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of gene sampling technology,people can obtain a huge amount of gene information from various creatures at a limited cost and this data are called as the gene expression data.Generally speaking,the sampled gene expression data stored in matrix format is called the gene expression matrix.Due to the high sample dimension and low sample quantity,traditional clustering algorithms can not handle gene expression matrix well.Biclustering algorithm,a new method to analyze the gene expression data efficiently,arises under such background.By simultaneously considering the rows and columns relationship it can get more complex information inside the matrix.The researchers proposed different kinds of biclustering algorithms based on the assumption of the implicit data structure inside the gene expression matrix.However,three problems pervasively exist in the current algorithms:1:high computation complexity;2:Sensitive to noise;3:Can not explicitly take advantage of the clustering information from the last iteration.This dissertation tries to develop research in these three aspects.To address the problem 1 and 2,this dissertation comes up with the SVD preprocessing.This method utilizes the SVD’s row-column information separation ability and noise suppression property.For one thing,SVD preprocessing can separate the row and column clustering information,downgrade the biclustering problem into a one-dimensional clustering problem to avoid repeat computation and so reduce the computation complexity.For the other thing,SVD low-rank reconstruction can suppress the noise inside the matrix and improve the clustering performance.The simulation results show that this preprocessing method has great compatibility and can consistently improve the clustering accuracy under various biclustering algorithms and noise conditions.Aiming at solving problem 3,this dissertation proposed the multiple sample clustering and then designed an iterative spectral clustering-based biclustering algorithm.This algorithm can explicitly takes advantage of the result from the last iteration to improve the clustering performance.What’s more,because of the widespreadness of the multiple sample data structure,apart from applied in biclustering algorithm,the multiple sample clustering can also contribute to the recommendation system,producing management,server clustering building,and weather prediction,etc.Finally,we utilize the introduced algorithms into the synthetic dataset and lung cancer dataset.The simulation results support the point that the adjusted algorithm can greatly improve the clustering accuracy while reducing the computation complexity compared with the traditional algorithm.This can contribute to efficiently identifying information inside the gene expression data.

Keywords/Search Tags:

Gene Expression Data, Biclustering, SVD Preprocessing, Lung Cancer Dataset

PDF Full Text Request

Related items

1	Bioinformatics Analyses Of Hub Gene And Branched-chain Amino Acid Metabolic Gene Expression Patterns In Non-small Cell Lung Cancer
2	A Comparison And Evaluation Of Five Biclustering Algorithms For Gene Expression Data
3	Bioinformatics Analysis Of Non Small Cell Lung Cancer Related Gene Expression And Correlation Between ASPM And Lung Adenocarcinoma
4	Bioinformatics Analysis Differential Gene Of Non-small Cell Lung Cancer And To Explore Clinical Significance Of MKI67
5	Research On Biclustering Of Gene Expression Data Based On Swarm Intelligence
6	Research On Multi_Objective Optimization Algorithm For Biclustering In Microarry Gene Expression Data
7	Research On Biclustering Algorithm And Its Application In Gene Expression Data Analysis
8	Screening Of Key Genes Of Xuanwei Lung Cancer Based On High-throughput Transcriptome Data
9	Identification Of Potential Causal Genes In Lung Cancer By Integrating GWAS And EQTL Data
10	Study On The Method Of Gene Mutation Detection By Capillary Electrophoresis And Its Application In Lung Cancer