Font Size: a A A

Research On Keyword Generation Method Based On Self-attention Mechanism And Copy Mechanism

Posted on:2020-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y D WangFull Text:PDF
GTID:2428330623951399Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology and the universal application of information technology,people's daily life and network have been closely linked.Due to the explosive growth of text information in the network,people can access a large amount of text information every day.However,most people do not have time to read and understand the text information in detail.How to quickly obtain valuable information from massive data has become an urgent need of people.Therefore,automatic acquisition of keyword provides an effective solution.There are currently two main ways to get keywords: extraction and generation.The extraction method relies on the statistics and ordering of word frequency,but it cannot reveal the semantic information hidden behind the text.The generation method is mainly based on the recurrent neural network construction model,but there is a problem of distancedependent feature restriction between words.This paper mainly proposes a selfattention mechanism-based encoder decoder model SAM(Self-Attention Model)for the above problems.In addition,in order to solve the problem of out of vocabulary,this paper proposes a keyword generation model SACM(Self-Attention Copy Model)based on SAM model,which incorporates copy mechanism and allows the model to copy critical phrases directly from the source text.The detailed research content of this paper is as follows:(1)A keyword generation model SAM based on the self-attention mechanism is proposed.The model generates keywords based on text semantics,and can generate keywords that do not exist in the source text.It is an encoder decoder framework that relies entirely on a self-attention mechanism to obtain global dependencies between input and output.Firstly,the internal dependencies of the input sequence are learned by the multi-layer multi-head self-attention mechanism in the encoder.Then,the dependencies are input to each multi-head self-attention mechanism layer in the decoder.Finally,the next key is generated by combining the information from the previous output.This model overcomes the problem that the previous generation method based on the recurrent neural network cannot be paralleled and the dependence between long distance words is weak due to its inherent sequence.(2)A keyword generation model SACM which combines self-attention mechanism and copy mechanism is proposed.In order to solve the problem of poor effect of generating out of vocabulary such as long-tail words(phrases with a large number of words),this paper integrates the copy mechanism based on the model SAM,allowing the model to directly copy critical information from the source text.The model has two modes: copy and generation.The copy mode copies information directly from source text by combining copy mechanism and self-attention mechanism.We propose a keyword generation model SACM which combines self-attention mechanism and copy mechanism.The generation mode is basically the same as the model SAM.The SACM model combines these two modes to generate the final keywords and enhance the generation effect of out of vocabulary such as long tail words.(3)The experimental results of predicting keywords,predicting present keywords and predicting absent keywords shows that the proposed model is better than the baseline model.At the same time,the model SACM based on the self-attention mechanism and the copy mechanism is better than the model SAM based only on the self-attention mechanism.Finally,the model is applied to the field of news to verify the generalization ability of this model.
Keywords/Search Tags:Keyword generation, Self-attention mechanism, Copy mechanism, Encoder-decoder model
PDF Full Text Request
Related items