The Research On Keyphrase Extraction Method Of Scientific Literature Based On Feature Representation

Posted on:2022-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Xie

Full Text:PDF

GTID:2518306560455204

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the era of rapid development of science and technology,the number of scientific and technological documents is increasing faster and faster.However,the length of scientific documents is generally longer than other documents and it is impossible to quickly grasp the core content.Therefore,a method for extracting keyphrases of scientific and technological documents is urgently needed.Keyphrase extraction refers to annotating phrases or words from a paragraph of text that can summarize the core meaning of the text.Most of the existing scientific and technological literature keyphrase extraction methods are based on word frequency information,and do not contain enough semantic information.Many methods belong to word-level and do not use the phrase information generated between words,so they are not effective in extracting long keyphrases of multiple words.In order to solve the above problems,this dissertation has carried out research on the keyphrase extraction methods of scientific and technological documents based on different feature representation methods on two scientific and technological documents:patent texts and scientific papers.The main contents are as follows:(1)Most of the existing keyword extraction methods of scientific literature are based on word frequency information and do not contain enough semantic information.Therefore,an unsupervised patent keyword extraction method based on clustering is proposed.Firstly,a Chinese patent corpus is used to train word vectors,and then each patent is represented as a patent vector.Then all patent vectors are clustered to obtain multiple cluster centers.Finally,the cosine similarity between the word vector of each word in the patent abstract and the cluster center is regarded as the importance of the word.Experimental results on multiple Chinese patent datasets prove the effectiveness of this method.(2)In recent years,keyphrase extraction methods based on word-level have achieved good results.However,these methods do not make full use of the phrase information generated by the word context,which results in poor extraction of keyphrases of different lengths.Therefore,a keyphrase extraction method for scientific papers based on multisize convolution windows is proposed.The method firstly uses the pre-trained Skip-gram model to obtain the word embedding representation space.Then,a convolutional neural network with multi-size filters is introduced to map the text into distributed feature vectors,where feature vectors represent the information of phrases with different lengths.Next,a deep recurrent neural network is used to mark the role played by each word.Finally,an attention mechanism is used to further judge the importance of each phrase.Experimental results on multiple public datasets prove the competitiveness of this method.

Keywords/Search Tags:

keyphrase extraction, clustering, multi-size convolution window, attention mechanism

PDF Full Text Request

Related items

1	Study On Text Clustering And Keyphrase Extraction Of Patent Document
2	Incorporate Graph Network And Seq2seq For Keyphrase Extraction
3	Research On Software Requirement Clustering Based On Deep Learning
4	Entiment Analysis Of Comment Text Based On Deep Learning
5	Researches For Lip Reading Based On Lightweight Convolution And Attention Mechanism
6	Research On Event Extraction Method Based On Attention Mechanism
7	Statistic-based Automatic Keypharse Extraction And Summarization From Multi-document
8	Chinese Keyphrases Extraction Technique
9	Research On The Keyphrase Extraction And Relevant Technology
10	Video Object Detection Based On Adaptive Convolution Network And Visual Attention Mechanism