Font Size: a A A

Research On Keyphrase Extraction Algorithm Based On Frequent Pattern Mining

Posted on:2020-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2428330575454502Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Keyphrase extraction is automatically extracting word or phrases that represent the main topics of a document or documents.Finding good keyphrases of a document can quickly summarize knowledge for information retrieval and decision making.Existing keyphrase extraction methods cannot be customized to each specific document,they are usually to extract keyphrase from documents or corpus,and cannot capture flexible semantic relationships.To solve these problems,this thesis proposes two novel keyphrase extraction models from a document in English.The one is document-specific supervised keyphrase extraction with strong semantic relations,called Ke_MSMING.It firstly searches all keyphrase candidates from the document using sequential patterns mining and the topic model,and then adopts supervised machine learning to classify each keyphrase candidate as a keyphrase or not.At last,it selects top-k ones as final keyphrases.In the step of training,Ke_MSMING not only uses baseline features and pattern features,but also uses centrality features obtained from the co-occurrence semantic network,and the co-occurrence networks can yield powerful semantic and co-word relations betw een words for keyphrase extraction.The other one is keyphrase extraction algorithm based on frequent pattern mining and word embedding,called Ke MSMVec.It firstly constructs a co-word network and uses deepwalk for learning latent representations of vertices in network,and then computes the mean vector of the words in its title and abstract,dubbed reference vector,which can intuitively consider as a vector representation of the semantics of the document.Next,it uses sequential pattern mining with one-off and general gaps condition algorithm called MSMING to search keyphrase candidates from document,MSMING not only can obtain the important frequency words,but also can capture the words with similar meaning but various forms.At last,it computes candidate keyphrases vector in terms of their cosine similarity with the reference vector,and the results are taken as features,combined with baseline features and pattern features to train the model of keyphrase extraction.In this thesis,the two novel keyphrase extraction models can capture the semantic information in the document,and effectively solve the problem of ignoring semantic relations in traditional methods.According to the evaluation metrics of precision(P),recall(R),and the F1-measure,experimental results on two datasets demonstrate that Ke MSMING and Ke_MSMVec have better performance than other state-of-the-art keyphrase extraction approaches.
Keywords/Search Tags:keyphrase extraction, sequential pattern mining, general gap constraints, word embedding, classification
PDF Full Text Request
Related items