Font Size: a A A

Research On Cross-Domain Text Classification Method Integrating External Knowledge

Posted on:2022-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:K DaiFull Text:PDF
GTID:2518306569494654Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cross-domain Text Classification(CDTC)aims to utilize source domain dataset(with sufficient labels)to train a classifier that can achieve high performance in the target domain dataset(without labels).The core task of CDTC is to reduce discrepancy between source domain and target domain,so that the model can learn the domain-shared knowledge.However,most of current text transferring methods are usually too complex,while these method fail to effectively solve the cross-domain classification task of long text and short text.On the one hand,these methods cannot solve the problem of polysemy,which means the same word may have completely different meanings in different domain during the transfer learning process for long text;on the other hand,these methods are unable to deal with semantic sparseduring the transfer learning process for short tests,whcih means little effective semantic information in short text.This dissertation focuses on the main difficulties of long text cross-domain classification task and short text cross-domain classification task,then try to integrate external knowledge to improve the performance of CDTC.The main contents are as follows:For the polysemy problem of long text transferring process,this dissertation proposes a Sentence-level Attention Transfer Network(Sent ATN)based on the pre-trained model.Sent ATN conducts transfer training at sentence level,which can better transfer sentiment across domains by capturing the complete semantic information of each sentence.The model adopts the BERT pre-trained model to obtain the representation of each sentence,and then utilizes sentence-level attention mechanism to obtain the hierarchical vector representation of the document,the domain discrepancy is also reduced based on the adversarial training,so that the model learns the knowledge shared by the two domains.Comprehensive experiments have been conducted on extended Amazon review dataset,and the results show that the performance of this model has been significantly improved compared to the previous benchmark methods.For the semantic sparse problem of short text transferring process,this dissertation proposes a External Knowledge Convolutional Neural Network(Ex KCNN)based on external sentiment dictionary.Short texts are usually too concise and lack useful semantic information,which leads to little shared semantic information between source domain and target domain.In this dissertation,we adopt external sentiment dictionary to expand the text information of the source domain and target domain,and then utilize the convolutional neural network to learn the relationship between the external sentiment words.In this way,Ex KCNN can learn more domain-shared knowledge.Comprehensive experiments have been conducted on the stance detection dataset,the results show that the performance of this model has been significantly improved compared to the previous benchmark methods.For summary,this dissertation aims at the specific difficulties in the long text crossdomain classification and the short text cross-domain classification,then attempts to integrate external knowledge to solve the corresponding problems,which has achieved fairly good results.
Keywords/Search Tags:cross-domain text classification, transfer learning, external knowledge
PDF Full Text Request
Related items