Font Size: a A A

Research On Text Classification Algorithm Based On Neural Network And Domain Adaptation

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:R ChenFull Text:PDF
GTID:2428330605467920Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is the process of assigning tags or categories to text according to its content.It's one of the fundamental tasks in Natural Language Processing(NLP)with broad applications.In the era of big data,analyzing and mining useful information from massive text data can not only save human resources,but also help businesses or governments to provide people with quality services based on the information.Therefore,how to quickly and effectively classify text has great practical significance.Existed text classification methods are mainly based on traditional machine learning algorithms and deep neural network methods.The traditional machine learning algorithms relies on feature engineering and has many shortcomings such as high dimensions,strong sparseness,poor expression ability,and inability to learn automatically.Neural network-based models have strong feature self-learning capabilities and have made great progress in the field of text classification.However,these models need to be trained on large-scale high-quality labeled dataset,and high-quality labeled data is scarce which consumes a lot of labor and time.On the other hand,because text classification is a domain-dependent task,that is,people in different domains use different expressions and vocabularies,and even the same vocabulary in different domains also convey different semantics,resulting in a low generalization of models trained in one domain for other domains.Therefore,researchers consider how to use other related fields with large amounts of labeled data to train neural network models,and then perform well on target domain datasets with little or no labeling.Researchers call this a domain adaptation problem,which is mainly dedicated to transferring knowledge from other fields to the target domain to alleviate the problem of insufficient labeled data and increase the generalization of the model.This paper researches the existing text classification algorithms and domain adaptation.The research contents of this paper are as follows:(1)An attention-based RNN network with joint embedding of words and characters for text classification.Most text classification methods take a word as a basic unit for capturing semantic regularities between words,but when handling previously unseen or rare words,these models may lose some semantic information.To address this issue,we proposed an attention-based RNN network with joint embedding of words and characters for text classification which combines the merits of character and word-level representation.Firstly,using Convolutional Neural Network(CNN)encodes character embedding of a word to obtain character-level representation.Then we combine and feed them into Bi-directional Gated Recurrent Unit(BGRU)to extract the context information of each word.Finally,attention mechanism is added to the model to extract the important feature.(2)Correlation Alignment with Attention Mechanism for Unsupervised Domain Adaptation.Not all features of the source domain should be transferred,and it would cause negative transfer when aligning the untransferable features.We propose a correlation alignment with attention mechanism for unsupervised domain adaptation model.In the model,an attention mechanism is introduced into the transfer process for domain adaptation,which can capture the positively transferable features in source and target domain.Moreover,the correlation alignment loss is utilized to minimize the domain discrepancy by aligning the second-order statistics of the positively transferable features extracted by the attention mechanism.(3)Joint Adversarial Domain Adaptation and Correlation Alignment for Cross-Domain Sentiment Classification.Most existing methods mainly focus on learning domain invariant representation and ignoring domain-specific information.We propose a joint adversarial domain adaptation and correlation alignment for cross-domain sentiment classification.This model can extract domain-shared and domain-specific representation simultaneously and incorporate the domain-shared information from target domain into source domain which improve the generalization of classifier on the target domain.We introduce adversarial training into our model which can extract domain-shared information more effectively.
Keywords/Search Tags:natural language processing, text classification, domain adaptation, cross-domain sentiment classification, attention mechanism
PDF Full Text Request
Related items