Font Size: a A A

Research On Dimension Reduction Method Of Gene Expression Data Based On Graph Signal Processing

Posted on:2023-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2530306836963159Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of modern society,the incidence of cancer is getting higher and higher.How to make a rapid and effective diagnosis of cancer in the early stage and improve the survival rate of patients has become an urgent social problem to be solved.The rapid development of high-throughput gene sequencing,DNA microarray and other technologies has made gene expression data available in large quantities,then the use of gene expression data to analyze and diagnose cancer has become an emerging method of diagnosis and treatment.A gene chip,however,tend to have tens of thousands of genes.At the same time,since genetic testing is expensive and genetic data is susceptible to noise,there are relatively few valid case samples available.As a result,gene expression data are characterized by high dimension and small sample size,which leads to the inevitable curse of dimensionality.In addition,there are few genes that cause cancer,and most of the genes in the samples are redundant genes unrelated to cancer.These characteristics of gene expression data make the direct use of it to classify cancers with low efficiency and unsatisfactory classification accuracy.Therefore,how to screen out the genes related to cancer classification and recognition from massive gene expression data is a key issue in the analysis and processing of gene expression data.In view of the high complexity and poor classification effect existing in traditional gene feature selection methods,this paper proposes two dimension reduction algorithms to reduce the dimension of gene expression data,extract the genes associated with cancer,and screen out the genes with strong classification ability.Numerical experiments on several real gene datasets demonstrate the effectiveness of the proposed algorithms.The main work of this paper is as follows:(1)The existing gene feature selection algorithms seldom consider the correlation between samples(patients /normal person)and genes,and cannot effectively remove the redundancy.Aiming at this problem,a gene selection algorithm based on graph model and graph smoothness is proposed.Several kinds of graphs for gene selection were constructed by modeling the samples(patients /normal person)as the vertices of the graphs and the gene data as the graph signals.With the Laplacian matrix of the graphs,the small number of featured genes were determined by selecting the signals with the highest nonsmoothness metrics which can be calculated in distributed manner,favoring for fast implementation of dimensional reduction.Finally,numerical experiments on real-world datasets show the competitive performance of the proposed method over the existing approaches.(2)For further capturing the correlation between gene data,a gene feature selection algorithm based on C3 NET and graph filter is proposed.Firstly,the gene regulation network of gene expression data was deduced by C3 NET algorithm to obtain the regulatory relationship between genes.Secondly,for the gene expression data,the genes were modeled as nodes on the graph and the gene data of each gene was modeled as graph signals.The obtained gene regulation network was modeled as adjacency matrix.Then,the graph Laplacian matrix and the graph Fourier transform were calculated.Moreover,an evaluation method of gene classification ability based on graph Fourier transform was proposed to calculate the classification ability of each gene.Finally,a high-pass filter was designed to filter the gene data,and the genes with high classification ability were screened out.The simulation results indicate that the classification ability of the proposed algorithm is higher than that of comparison algorithms compared with the existing gene selection algorithms and the proposed algorithm can maintain high classification accuracy in different classifiers.
Keywords/Search Tags:gene selection, feature selection, graph signal processing, gene regulation network, dimension reduction
PDF Full Text Request
Related items