Font Size: a A A

Exploiting Topic-based Adversarial Neural Network For Cross-domain Keyphrase Extraction

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y N WangFull Text:PDF
GTID:2428330602499098Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In today's era of data explosion,concepts such as data,information,and knowl-edge have been related to everyone and various industries.However,raw data of any form conveys no information unless it is processed in some intelligent way.Knowing the most important phrases of textual documents can provide a condensed representa-tion of them which can considerably ease their processing.Keyphrases of a document provide high-level descriptions of the content,which summarise the core topics,con-cepts,ideas or arguments of the document.These descriptive phrases enable algorithms to retrieve relevant information more quickly and effectively,which plays an important role in many areas of document processing,such as document indexing,classification,clustering and summarization.However,most documents lack keyphrases provided by the authors,and manually identifying keyphrases for large collections of documents is infeasible.However,the manual determination of the sets of important phrases for every single document in a large collection of documents is a tedious and expensive task and it often requires expert knowledge.Fortunately,natural language processing techniques can help the automatic generation of keyphrases for documents.At present,solutions for automatic keyphrase extraction mainly rely on manually selected features,such as frequencies and relative occurrence positions.However,such solutions are dataset-dependent,which often need to be purposely modified to work for documents of different lengths,discourse modes,and disciplines.This is due to the fact that the performance of such algorithms heavily relies on the selections of features,which turns the development of automatic keyphrase extraction algorithms into a time consuming and labor-intensive exercise.First,although supervised methods perform well in this task,it requires a large amount of labeled data which is extremely expensive and time-consuming to collect in many application scenarios.Second,most existing methods focus on single domain keyphrase extraction,which does not fully utilize the data in the resource-rich domains.Therefore,aiming at the above research problems,we investigate an under-explored problem of cross-domain keyphrase extraction.The major work and contributions are as follow:1.We investigate an under-explored problem of cross-domain keyphrase extrac-tion.We show that it is possible to use both labeled data from resource-rich domains and unlabeled data in the source and target domains for improving the performance of keyphrase extraction in the unlabeled target domain.2.We propose a novel topic-based adversarial neural network that can learn trans-ferable knowledge across domains efficiently by performing adversarial training.To the best of our knowledge,we are the first to exploit the adversarial learning technique for keyphrase extraction.3.We design a topic correlation layer to incorporate the topic-based representation of the document.Moreover,we also propose to reconstruct the document in the target domain from both forward and backward directions to learn the domain-private features.
Keywords/Search Tags:Adversarial Network, Transfer Learning, Keyphrase Extraction
PDF Full Text Request
Related items