Research On Text Classification Algorithm Based On Neural Network And Domain Adaptation

Posted on:2021-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:R Chen

Full Text:PDF

GTID:2428330605467920

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text classification is the process of assigning tags or categories to text according to its content.It's one of the fundamental tasks in Natural Language Processing(NLP)with broad applications.In the era of big data,analyzing and mining useful information from massive text data can not only save human resources,but also help businesses or governments to provide people with quality services based on the information.Therefore,how to quickly and effectively classify text has great practical significance.Existed text classification methods are mainly based on traditional machine learning algorithms and deep neural network methods.The traditional machine learning algorithms relies on feature engineering and has many shortcomings such as high dimensions,strong sparseness,poor expression ability,and inability to learn automatically.Neural network-based models have strong feature self-learning capabilities and have made great progress in the field of text classification.However,these models need to be trained on large-scale high-quality labeled dataset,and high-quality labeled data is scarce which consumes a lot of labor and time.On the other hand,because text classification is a domain-dependent task,that is,people in different domains use different expressions and vocabularies,and even the same vocabulary in different domains also convey different semantics,resulting in a low generalization of models trained in one domain for other domains.Therefore,researchers consider how to use other related fields with large amounts of labeled data to train neural network models,and then perform well on target domain datasets with little or no labeling.Researchers call this a domain adaptation problem,which is mainly dedicated to transferring knowledge from other fields to the target domain to alleviate the problem of insufficient labeled data and increase the generalization of the model.This paper researches the existing text classification algorithms and domain adaptation.The research contents of this paper are as follows:(1)An attention-based RNN network with joint embedding of words and characters for text classification.Most text classification methods take a word as a basic unit for capturing semantic regularities between words,but when handling previously unseen or rare words,these models may lose some semantic information.To address this issue,we proposed an attention-based RNN network with joint embedding of words and characters for text classification which combines the merits of character and word-level representation.Firstly,using Convolutional Neural Network(CNN)encodes character embedding of a word to obtain character-level representation.Then we combine and feed them into Bi-directional Gated Recurrent Unit(BGRU)to extract the context information of each word.Finally,attention mechanism is added to the model to extract the important feature.(2)Correlation Alignment with Attention Mechanism for Unsupervised Domain Adaptation.Not all features of the source domain should be transferred,and it would cause negative transfer when aligning the untransferable features.We propose a correlation alignment with attention mechanism for unsupervised domain adaptation model.In the model,an attention mechanism is introduced into the transfer process for domain adaptation,which can capture the positively transferable features in source and target domain.Moreover,the correlation alignment loss is utilized to minimize the domain discrepancy by aligning the second-order statistics of the positively transferable features extracted by the attention mechanism.(3)Joint Adversarial Domain Adaptation and Correlation Alignment for Cross-Domain Sentiment Classification.Most existing methods mainly focus on learning domain invariant representation and ignoring domain-specific information.We propose a joint adversarial domain adaptation and correlation alignment for cross-domain sentiment classification.This model can extract domain-shared and domain-specific representation simultaneously and incorporate the domain-shared information from target domain into source domain which improve the generalization of classifier on the target domain.We introduce adversarial training into our model which can extract domain-shared information more effectively.

Keywords/Search Tags:

natural language processing, text classification, domain adaptation, cross-domain sentiment classification, attention mechanism

PDF Full Text Request

Related items

1	Research Of Cross-domain Sentiment Classification Methods Based On Domain Space Alignment
2	Cross-domain Classification Based Sentiment Analysis For Product Reviews
3	Research On Corss-Domain Sentiment Analysis Based On Attention Mechanism
4	Research And System Construction Of Cross-domain Sentiment Classification Method Based On Generative Adversarial Networks
5	Research And Application Of Chinese Text Sentiment Analysis Method Based On Transfer Learning
6	Research On Text Sentiment Classification Based On Deep Neural Network
7	Research On Sentiment Classification Based On BiGRU And Aspect Attention Module
8	Research On Short Text Sentiment Classification Model Based On Deep Learning
9	The Research On Domain Adaptation For Sentiment Classification Of Product Reviews
10	Research On Fine-grained Text Sentiment Classification For Social Internet Of Things