High-dimensional And Sparse Data Classification Based On Deep Learning

Posted on:2020-09-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M Y Jiang

Full Text:PDF

GTID:1368330602955533

Subject:computer science and Technology

Abstract/Summary:

Internet big data contains a large amount of text,how to effectively manage and utilize these data is a research hotspot of information science.At the same time,with the continuous advancement of high-throughput experimental techniques,bio-group data has exploded.Disease characterization based on omics data is a hot topic in biomedical research.Text and metabolomics data,although of different origins,are characterized by high dimensionality and sparsity.Traditional machine learning methods often fail to achieve satisfactory results due to dimensionality catastrophe when solving high-dimensional sparse matrix computing problems.This dissertation proposes a high-dimensional sparse data classification method based on deep learning,focusing on the application of deep learning in text and metabolomics data classification.The specific research work is as follows:(1)For high-dimensional sparse text data,a text classification method combining Deep Belief Networks(DBN)and softmax classifier is proposed.In this method,DBN is used to reduce dimensionality of high-dimensional and sparse text data,and softmax classifier implements classification of dimensionality-reducing data.In the pre-training process,DBN and softmax respectively complete their respective work;in the fine-tuning phase,we consider the two as a whole,and introduce the Limited memory Broyden Fletcher Goldfarb Shanno algorithm(L-BFGS)to adjust the system model parameters.Experiments on the Reuters-21578 and 20-Newsgroup datasets show that the proposed methods can converge in the fine-tuning phase for text data of different scales,and the effect of text categorization is significantly better than the K-Nearest Neighbor algorithm(KNN)and Support Vector Machine(SVM)algorithms.(2)For the metabolomics data of breast hyperplasia with high dimensional sparsity and small sample characteristics,this dissertation proposes a DBN and softmax classification model that combines the dropout strategy.In the model training process,the DBN pre-training is first completed by using unlabeled data,and the L-BFGS is used to complete the fine-tuning of the system model.At the same time,in order to avoid over-fitting as much as possible,the dropout method is introduced in the pre-training and fine-tuning process.During the experiment,the results of five-fold cross-validation and datasets of different scales show that the proposed classification method is better than KNN,SVM and Back Propagation Neural Network(BPNN),and the classification results are stable.(3)This dissertation introduces a classification study of expanded cardiomyopathy metabolomics data based on Stacked Auto Encoder(SAE)and SVM.Because of their small sample size,high-dimensional,nonlinear and noisy parameters,traditional feature extractions and classifications are very difficult to achieve satisfactory results.SAE performs non-linear transformations with hidden layers,which can learn complex relationships.It has a strong ability to represent high-order features,and can extract more complex features of metabolomic data.Experimental results on real metabolomics data of dilated cardiomyopathy demonstrate that the proposed model obtains better performance compared to other existing algorithms.

Keywords/Search Tags:

deep belief networks, stacked auto encoder, text data, metabolomics data, high-dimensional and sparse

Related items

1	Small Target Detection In The Background Of Sea Clutter Using Deep Learning Method
2	Improvement Of Stacked Auto-Encoders And Its Engineering Application
3	Deep Auto-encoder Framework For SAR Images Change Detection
4	Research On Speech Enhancement Method Based On Deep Learning Neural Networks
5	Memetic Algorithm Based Feature Weiehting For High-dimensional Metabolomics Data
6	Design And Implementation Of A New Germplasm Resources Data Warehouse System
7	Research On Text Classification Based On Hybrid Model Of Deep Learning
8	Research On Text Classification Of Deep Learning Mixing Model Based On Map Reduce
9	Deep Learning Algorithm Based On The Sparse Auto-Encoder And Marginalized Denoising Auto-Encoder
10	The Research Of Text Classification Based On Deep Learning