Font Size: a A A

Research On Keyword Extraction Algorithms Based On Semantic Features

Posted on:2020-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Z ZhouFull Text:PDF
GTID:2428330590976550Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Keyword extraction is a widely used technology.In the early stage,it was extracted manually.Later,scholars put forward some automatic methods,and the geometric growth of information needs more effective methods.Traditional algorithms are mainly based on statistical methods,and keywords themselves lack standards.At present,the deep learning method can automatically learn the characteristics of data and output good results,so it uses deep learning technology to learn the semantic features between keywords and documents to achieve better algorithm.This paper mainly makes the following innovations:1.Use word vector to improve Text Rank.Fast Text is used to represent the document set by word vectors.Based on the idea of implicit topic distribution,this idea holds that a document is composed of words belonging to different topics,and the difference between the central words of each topic is the greatest.Therefore,using semantic differences between words,the probability transfer matrix of Text Rank is improved.Let the weight transfer more to the words with large semantic differences,so as to increase the weight of the subject headwords,and improve the effect of the original algorithm;2.Construct document-keyword pairs and transform keyword extraction into two-category task.In the process of keyword extraction,we usually only focus on the document itself,but not make good use of the training data with annotations.This paper assumes that there is a certain distribution between the document and the keywords.The keywords are obtained by sampling.By constructing the document-keyword pair and learning the distribution through the model,the keyword extraction is transformed into a two-category task,and realized the learning of semantic features between documents and keywords.3.Extraction of keywords by generative adversarial networks.Generating antagonistic networks can learn the true distribution of data very well,so the hypothesis of point 2can be realized.The generator uses Seq2 Seq model and attention mechanism to learn the semantic features of words in order to improve the possibility of keywords being extracted.In addition,because the keywords are discrete data,the network is trained by gradient updating using the policy gradient in reinforcement learning.
Keywords/Search Tags:Keyword extraction, Semantic features, word vector, two-category, Generative Adversarial Networks
PDF Full Text Request
Related items