Font Size: a A A

Research On Topic - Oriented Keyword Extraction Method

Posted on:2014-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y DingFull Text:PDF
GTID:1108330434971197Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Keyphrases are quick ways to obtain the information for users. Automatic keyphrase extraction has important theoretical and practical value in the field of information retrieval and natural language processing. Traditional methods for keyphrase extraction simply rank keyphrases according to the statistical in-formation without considering the topics of documents and the overall quality of keyphrases. In this paper, we focus on how to model the topics and optimize the overall quality of keyphrases. We study an integer linear programming model for keyphrase extraction; a learning to rank model for keyphrase extraction; a topic-oriented translation model for microblog keyphrase extraction. We present the three methods in detail as follows:(1)Integer linear programming for keyphrase extraction. We first present several criteria of high-quality keyphrases. After that, in order to integrate those criteria into the keyphrase extraction task, we propose a novel formulation which converts the task to an integer linear programming problem. The formulation cannot only encode the prior knowledge as constraints, but also learn constraints from data. Experimental results demonstrate that our approach achieves better performances compared with the state-of-the-art methods.(2) Learning to rank for summary-keyphrase extraction. We first present several criteria of high-quality summary-keyphrases. After that, in order to in-tegrate those criteria into the keyphrase extraction task, we propose a novel formulation which coverts the task to a learning to rank problem. Our approach involves two phases:selecting candidate keyphrases and ranking all possible sub-permutations among the candidates. The proposed method is evaluated on a multi-news collection and experimental results verify that our proposed method is effective to extract coherent summary keyphrases.(3)Topic-oriented translation model for microblog keyphrase extraction. We propose a novel topic-oriented translation model for microblog keyphrase extrac-tion. The proposed model can combine the advantages of both translation model and topic model. In one hand, it can solve the problem of vocabulary gap between words and keyphrases; in other hand, it can extract topic-related keyphrases by modeling the topics. We also try two ways of sampling the topic, word-level and document-level. The experimental results show that our model can outperform some baseline methods, including topic model and translation model.
Keywords/Search Tags:Natural language processing, Keypgrase extraction, Topoc model, Learning to rank, Integer linear programming
PDF Full Text Request
Related items