Research On Dimensionality Reduction Of Gene Expression Data Based On Traditional Feature Extraction And Deep Learning

Posted on:2019-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Gao

Full Text:PDF

GTID:2428330572451738

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The use of DNA microarray technology has prompted more and more gene expression data to be generated.A considerable amount of information is contained in gene expression data.Analysis of gene expression data will increase understanding of gene expression differences between tumor cells and normal cells,as well as identify genes closely related to tumor formation,which is of great importance in the diagnosis,treatment,and prevention of cancer.Classification is an important means to cancer diagnosis.However,due to the characteristics of gene expression data with high dimensionality and large amounts of redundant information,if the traditional classification method is used to directly classify samples,�dimensionality curse� may occur.If the traditional method of dimensionality reduction is used to reduce the dimension of the original data,it can solve the "dimensionality curse" of gene expression data,but it may affect the classification accuracy of gene expression data.Therefore,selecting the appropriate feature extraction method is a key step before the classification of gene expression data.Deep learning is a feature learning approach that can learn complex structure of data with high dimensionality.Therefore,this paper studies the dimensionality reduction effect of auto-encoder on gene expression data and compares them with traditional feature extraction methods such as principal component analysis,linear discriminant analysis and kernel principal component analysis.The experimental results demonstrate that in eight gene expression datasets,the auto-encoder reduces dimensionality better than principal component analysis,linear discriminant analysis,and kernel principal component analysis.At the same time,it also validates the effectiveness of auto-encoder in dimension reduction of gene expression data.Due to the gene expression data with high dimensionality,auto-encoder in learning original data is relatively complex and time-consuming.To solve this problem,this paper proposes the following improved algorithm based on auto-encoder:(1)Feature learning approach based on PCA,KPCA and auto-encoder.The feature learning approach performs feature learning through two stages.The first stage is based on the phases of PCA and KPCA,and the second stage is based on PCA features and KPCA features,the auto-encoder obtains higher-level and more complex features for classification.(2)Feature learning approachbased on PCA,LDA and auto-encoder.(3)Feature learning approach based on KPCA,LDA and auto-encoder.Finally,simulation experiments on eight kinds of gene expression data are carried out by MATLAB to improve the effectiveness of the improved algorithm.The experimental results show that the improved approach proposed in this paper is better than the contrast method,and greatly reduces the computational complexity of the auto-encoder to learn raw data features.At the same time,by comparing three improved algorithms,the following conclusions are drawn: The feature learning approach based on PCA,LDA and auto-encoder has advantages for classification of multi-category data,and the feature learning method based on a combination of KPCA,LDA and an auto-encoder has advantages for classification of two types of data.

Keywords/Search Tags:

Gene Expression Data, Deep Learning, Auto-encoder, Principal Component Analysis, Linear Discriminant Analysis, Kernel Principal Component Analysis

PDF Full Text Request

Related items

1	Research On Feature Extraction Based On Principal Component Analysis
2	Construction Method Of Principal Component Networks And Its Application
3	Study On Infrared Target Detection And Tracking Under Complex Backgrounds
4	Research On Kernel Projection Analysis Based Feature Extraction And Applications
5	Certification, Based On The Identity Of The Video Content
6	Research On Appearance-based Statistical Face Recognition
7	Gait Based Human Identification Research For Safety-Critical Environments
8	Research Of License Plate Location Algorithm Based On Principal Component Analysis And Fisher Linear Discriminant
9	Recognition Method Of Metal Fracture Images Based On Empirical Ridgelet And Principal Component Analysis
10	On using block principal component analysis for reducing gene-expression data dimensions