Research On Cross-Domain Text Classification Method Integrating External Knowledge

Posted on:2022-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:K Dai

Full Text:PDF

GTID:2518306569494654

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cross-domain Text Classification(CDTC)aims to utilize source domain dataset(with sufficient labels)to train a classifier that can achieve high performance in the target domain dataset(without labels).The core task of CDTC is to reduce discrepancy between source domain and target domain,so that the model can learn the domain-shared knowledge.However,most of current text transferring methods are usually too complex,while these method fail to effectively solve the cross-domain classification task of long text and short text.On the one hand,these methods cannot solve the problem of polysemy,which means the same word may have completely different meanings in different domain during the transfer learning process for long text;on the other hand,these methods are unable to deal with semantic sparseduring the transfer learning process for short tests,whcih means little effective semantic information in short text.This dissertation focuses on the main difficulties of long text cross-domain classification task and short text cross-domain classification task,then try to integrate external knowledge to improve the performance of CDTC.The main contents are as follows:For the polysemy problem of long text transferring process,this dissertation proposes a Sentence-level Attention Transfer Network(Sent ATN)based on the pre-trained model.Sent ATN conducts transfer training at sentence level,which can better transfer sentiment across domains by capturing the complete semantic information of each sentence.The model adopts the BERT pre-trained model to obtain the representation of each sentence,and then utilizes sentence-level attention mechanism to obtain the hierarchical vector representation of the document,the domain discrepancy is also reduced based on the adversarial training,so that the model learns the knowledge shared by the two domains.Comprehensive experiments have been conducted on extended Amazon review dataset,and the results show that the performance of this model has been significantly improved compared to the previous benchmark methods.For the semantic sparse problem of short text transferring process,this dissertation proposes a External Knowledge Convolutional Neural Network(Ex KCNN)based on external sentiment dictionary.Short texts are usually too concise and lack useful semantic information,which leads to little shared semantic information between source domain and target domain.In this dissertation,we adopt external sentiment dictionary to expand the text information of the source domain and target domain,and then utilize the convolutional neural network to learn the relationship between the external sentiment words.In this way,Ex KCNN can learn more domain-shared knowledge.Comprehensive experiments have been conducted on the stance detection dataset,the results show that the performance of this model has been significantly improved compared to the previous benchmark methods.For summary,this dissertation aims at the specific difficulties in the long text crossdomain classification and the short text cross-domain classification,then attempts to integrate external knowledge to solve the corresponding problems,which has achieved fairly good results.

Keywords/Search Tags:

cross-domain text classification, transfer learning, external knowledge

PDF Full Text Request

Related items

1	Research On Textual Knowledge Transfer Of Cross-domain Social Events
2	The Research And Application Of Sentiment Classification Based On Transfer Learning
3	Research Of Transfer Learning And Its Application In Classifying Cross-domain Data
4	Research And Application Of Chinese Text Sentiment Analysis Method Based On Transfer Learning
5	Research On Cross-Domain Text Classification Of Tendency Analysis Based On Ensemble Learning
6	Research On Cross-domain Recommendation Based On Transfer Learning
7	Research Of Instance-based And Feature-based Transfer Learning For Text Classification
8	Research On Heterogeneous Machine Learning For Cross-Domain Document Classification
9	Research On Cross-Domain Text Classification
10	Research On Online Transfer Learning Algorithm And Its Application On Text Classification