Font Size: a A A

The Research On Keyphrase Extraction Method Of Scientific Literature Based On Feature Representation

Posted on:2022-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XieFull Text:PDF
GTID:2518306560455204Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of rapid development of science and technology,the number of scientific and technological documents is increasing faster and faster.However,the length of scientific documents is generally longer than other documents and it is impossible to quickly grasp the core content.Therefore,a method for extracting keyphrases of scientific and technological documents is urgently needed.Keyphrase extraction refers to annotating phrases or words from a paragraph of text that can summarize the core meaning of the text.Most of the existing scientific and technological literature keyphrase extraction methods are based on word frequency information,and do not contain enough semantic information.Many methods belong to word-level and do not use the phrase information generated between words,so they are not effective in extracting long keyphrases of multiple words.In order to solve the above problems,this dissertation has carried out research on the keyphrase extraction methods of scientific and technological documents based on different feature representation methods on two scientific and technological documents:patent texts and scientific papers.The main contents are as follows:(1)Most of the existing keyword extraction methods of scientific literature are based on word frequency information and do not contain enough semantic information.Therefore,an unsupervised patent keyword extraction method based on clustering is proposed.Firstly,a Chinese patent corpus is used to train word vectors,and then each patent is represented as a patent vector.Then all patent vectors are clustered to obtain multiple cluster centers.Finally,the cosine similarity between the word vector of each word in the patent abstract and the cluster center is regarded as the importance of the word.Experimental results on multiple Chinese patent datasets prove the effectiveness of this method.(2)In recent years,keyphrase extraction methods based on word-level have achieved good results.However,these methods do not make full use of the phrase information generated by the word context,which results in poor extraction of keyphrases of different lengths.Therefore,a keyphrase extraction method for scientific papers based on multisize convolution windows is proposed.The method firstly uses the pre-trained Skip-gram model to obtain the word embedding representation space.Then,a convolutional neural network with multi-size filters is introduced to map the text into distributed feature vectors,where feature vectors represent the information of phrases with different lengths.Next,a deep recurrent neural network is used to mark the role played by each word.Finally,an attention mechanism is used to further judge the importance of each phrase.Experimental results on multiple public datasets prove the competitiveness of this method.
Keywords/Search Tags:keyphrase extraction, clustering, multi-size convolution window, attention mechanism
PDF Full Text Request
Related items