Keyphrase,that concisely describe the high-level topics discussed in a document,are very useful for a wide range of natural language processing(NLP)tasks.Current popular supervised methods for keyphrase extraction commonly cannot effectively utilize multi-scale information in text at the same time,such as phrase representation,local information and global information.Therefore,they cannot understand the content of the document more deeply from different scales,nor can they effectively extract the keyphrase from the text.In many tasks of NLP,existing research shows that making full use of the multi-scale information of documents can effectively capture the latent semantic information in text sequences at different levels.Therefore,the research content of this paper use the information from different scales of text,so that the keyphrase extraction model can extract keyphrase from more accurately.The main tasks include:Aiming at the shortcomings of the existing keyphrase extraction methods based on sequence labeling that cannot effectively use the long-distance context information in the text,the multi-level memory network with conditional random field(MLM-CRF)model is proposed.This model uses a memory network to capture remote contextual information.Firstly,the multi-level memory network is used to capture the long-distance context information in the text at the sentence level and the document level;secondly,the conditional random field is used to extract keyphrase in the document.Compared with five existing keyphrase extraction methods on two general datasets,the experimental results show that the proposed MLM-CRF model obtains better extraction results.In view of the large proportion of long keyphrase in the datasets,and the existing keyphrase extraction methods based on sequence labeling have insufficient ability to extract long keyphrase in the text,a Hybrid Semi-Markov Conditional Random Field Model(HSCRF)is proposed.The model uses a hybrid semi-Markov conditional random field to strengthen the extraction of long keyphrase.Firstly,the model enhance the representation of long keyphrase by learning the semantic and positional relationship between each word in the phrase.Secondly,the target text is labeled with phrase-level keyphrase tag through the hybrid semi-Markov conditional random field.At the same time,the HSCRF combine the word-level conditional random field(the method described in the above paragraph)to jointly label the target text with keyphrase.The comparison with the experimental results of existing keyphrase extraction methods shows that the proposed HSCRF keyphrase extraction algorithm effectively improves the keyphrase extraction effect. |