Font Size: a A A

Research And Implementation Of Keyword Extraction And Generation

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:T S HuangFull Text:PDF
GTID:2428330632962851Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Key phrases refer to a set of words used by some media applications when making or using indexes.With the rapid development of the Internet,information on the network has continued to increase,and the use of the network has also begun to rise.It is more and more important to search and manage information effectively.As a brief overview of documents,key phrases provide a solution to help organize and retrieve documents,which can be effectively used to understand,organize and retrieve text content.These documents have been widely used in digital libraries and information retrieval.This topic mainly uses deep learning to study the problem of key phrases extraction and generation.On the one hand,there will be key phrases missing and redundancy in key phrases extraction and generation.Key phrases missing means that the model cannot generate words outside the vocabulary.Key phrases redundancy means that there are several key phrases focus on the same level summarized in the text.On the other hand,in the previous deep learning models of key phrases extraction and generation,the attention mechanism used to consider the relationship between the target key phrases and the hidden layer representation of the source text after passing the encoder,but not the relationship between the target side and each independent word in the source text.This is also an important problem to be solved in this topic.Based on the above problems,two deep learning models are proposed to extract and generate key phrases.First,this topic proposes a sequence-to-sequence(encoder-decoder)model to generate key phrases.This model combines the attention mechanism to calculate the weight relationship between the target side and the hidden layer.In addition,for the problem of key phrases missing and keyword redundancy mentioned above,the model also combines replication mechanism and coverage mechanism.And this topic proposes a new type of word attention mechanism,which solves the problem that the traditional attention mechanism only pays attention to the hidden layer representation(subsequence level information).By focusing on the original text and calculating the relationship between a single independent word and the target key phrase in the original text representation,new attention vector and word level information are obtained.The model is tested on several real datasets,and the validity and reliability of the model are verified.Second,this topic uses multiple methods to combine word level information with subsequence level information.Two different ways are used to combine the traditional attention mechanism with the word attention mechanism proposed in this paper,so as to more efficiently extract and generate target key phrases.The model is also tested on many real data sets,and the experimental results show that our model has a relative improvement compared with the best baseline model.Finally,the model proposed in this topic was used in the Beijing University of Posts and Telecommunications' Adaptive Personalized Education Platform developed by the laboratory.The platform currently contains many functions,such as special learning and personalized recommendations,test paper tests,full resource retrieval,comments and responses,etc.The platform has been open to school students for a period of trial.This topic combines the model with the basic data information such as topics and comments in the system,and extracts the key phrases of each information.This makes it easier for users to retrieve and query the required information more intuitively.Effectively proves the practical application value of the algorithm model studied in this topic.The model proposed in this subject has been verified on multiple public data sets,and the results show that the performance of the model is the best at present.The two papers we wrote were published and included in international conferences.
Keywords/Search Tags:phrases extraction and generation, attention mechanism, sequence-to-sequence model
PDF Full Text Request
Related items