The Research On Cross Domain Text Classification Based On Autoencoders

Posted on:2020-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:S Yang

Full Text:PDF

GTID:2428330575996969

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In the recent years,the autoencoder model has been widely used for cross-domain text classification,among them,the denoising autoencoder model which can learn abstract and robust feature representations achieves better performance in cross-domain learning.In the previous work,the denoising autoencoder model set the noise coefficient as a constant.However,different cross-domain tasks have different sensitivity to the noise coefficient because they follow different data distributions.In addition,the local geometry structure information is not preserved and the domain divergence that may arise in the new feature space in feature representation learning based on autoencoder model,which poses great challenges to the existing cross-domain classification methods based on the denoising autoencoder model.In view of problems mentioned above,this dissertation focuses on cross-domain text classification research based on the autoencoder model,the main works are as follows:?1?In consideration of the problem of different cross-domain tasks are sensitive to noise coefficient,marginalized stacked denoising autoencoder with adaptive noise probability?mSDA-AP?is proposed for cross-domain text classification.Firstly,the shared features and special features of source domain and target domain are selected and weighted to enlarge the proportion of features with strong polarity.Then,the noise coefficient is calculated according to the distribution difference of the shared features between two domains,and the input data is corrupted with the noise coefficient.Finally,a classifier is constructed based on the new feature space which obtained by marginalized stacked denoising autoencoder?mSDA?to classify the unlabeled data in the target domain.Experimental results show the proposed approach can achieve better classification accuracy than several state-of-the-art baseline methods.?2?In view of the fact that the autoencoder model uses Frobenius norm to measure reconstruction error which is sensitive to outliers,we propose L_2,1-norm stacked autoencoders?SRAAR?for cross-domain text classification.In this method,the L_2,1-norm is used to measure the reconstruction error between the original feature space and the new feature space,and the manifold regularization and the maximum average difference?MMD?are introduced into the objective function to preserve the local geometry structure information of the data and minimize the distribution divergence between two domains in feature representations learning.And a classifier is trained based on the new feature representations to classify target domain samples.Extensive experiments demonstrate that SRAAR can obtain excellent performance on the across-domain text classification tasks.

Keywords/Search Tags:

Text classification, Cross-domain, Autoencoder, L2,1-norm

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Based On Neural Network And Domain Adaptation
2	Cross-domain Classification Based Sentiment Analysis For Product Reviews
3	Research On Cross-domain Text Classification Based On Multi-topic Spaces
4	Research On Cross-Domain Text Classification
5	Research On Emotional Classification Technology Of Cross - Domain Short Text In Micro - Precision Precision Marketing Platform
6	Research And System Construction Of Cross-domain Sentiment Classification Method Based On Generative Adversarial Networks
7	Research On Cross-Domain Text Classification Method Integrating External Knowledge
8	Research On Cross-domain Recommendation Algorithm Based On Autoencoder
9	Cross-Lingual Text Classification Based On Monolingual Word Embedding Mapping Without Parallel Corpus
10	Research On Image Generation Algorithm Based On Autoencoder