Font Size: a A A

The Research On Cross Domain Text Classification Based On Autoencoders

Posted on:2020-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2428330575996969Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the recent years,the autoencoder model has been widely used for cross-domain text classification,among them,the denoising autoencoder model which can learn abstract and robust feature representations achieves better performance in cross-domain learning.In the previous work,the denoising autoencoder model set the noise coefficient as a constant.However,different cross-domain tasks have different sensitivity to the noise coefficient because they follow different data distributions.In addition,the local geometry structure information is not preserved and the domain divergence that may arise in the new feature space in feature representation learning based on autoencoder model,which poses great challenges to the existing cross-domain classification methods based on the denoising autoencoder model.In view of problems mentioned above,this dissertation focuses on cross-domain text classification research based on the autoencoder model,the main works are as follows:?1?In consideration of the problem of different cross-domain tasks are sensitive to noise coefficient,marginalized stacked denoising autoencoder with adaptive noise probability?mSDA-AP?is proposed for cross-domain text classification.Firstly,the shared features and special features of source domain and target domain are selected and weighted to enlarge the proportion of features with strong polarity.Then,the noise coefficient is calculated according to the distribution difference of the shared features between two domains,and the input data is corrupted with the noise coefficient.Finally,a classifier is constructed based on the new feature space which obtained by marginalized stacked denoising autoencoder?mSDA?to classify the unlabeled data in the target domain.Experimental results show the proposed approach can achieve better classification accuracy than several state-of-the-art baseline methods.?2?In view of the fact that the autoencoder model uses Frobenius norm to measure reconstruction error which is sensitive to outliers,we propose L2,1-norm stacked autoencoders?SRAAR?for cross-domain text classification.In this method,the L2,1-norm is used to measure the reconstruction error between the original feature space and the new feature space,and the manifold regularization and the maximum average difference?MMD?are introduced into the objective function to preserve the local geometry structure information of the data and minimize the distribution divergence between two domains in feature representations learning.And a classifier is trained based on the new feature representations to classify target domain samples.Extensive experiments demonstrate that SRAAR can obtain excellent performance on the across-domain text classification tasks.
Keywords/Search Tags:Text classification, Cross-domain, Autoencoder, L2,1-norm
PDF Full Text Request
Related items