Research On Keyword Extraction Method Based On Document Topical Structure And Word Graph Iteration

Posted on:2020-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:M Z Sun

Full Text:PDF

GTID:2428330590972568

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the internet,online text information has grown exponentially,and how to accurately and quickly locate the required information in massive information has become particularly important.The keyword is the smallest unit that can represent the information of the document content.It can concisely express the main purpose of the document and become the main tool for people to quickly grasp the content of the document.The traditional way of obtaining keywords is that experts mark the documents;in the face of today's massive network texts,manual labeling has become unrealistic.The society urgently needs to automatically mark the keywords through the computer;therefore,the keyword automatic extraction technology has gradually become a research hotspot;at the same time,the keyword automatic extraction technology is widely used in search engines,news services and other fields to realize information retrieval and text.And it is the basis of tasks such as automatic summary generation,text classification and clustering.Therefore,this paper proposes a keyword extraction method based on document topic structure and word graph iteration to improve the accuracy and recall rate of keyword extraction.This paper first describes the background and significance of the topic,summarizes the research status of keyword extraction at home and abroad,and then briefly introduces the basic theory of this paper: clustering algorithm,LDA topic model and complex network model;then based on internal information of the document,the word clustering result of the document is used as a node in the word graph to construct a fully connected network graph for keyword extraction.This method improves the keyword coverage of the keyword to a certain extent and reduces the candidate word redundancy phenomenon;Limited to the shortcomings of insufficient information provided by documents,a method based on multi-document topic structure and word graph iteration is proposed,which comprehensively considers multi-document topic information and single-document internal structure information,and uses topic model modeling results to change word graph structure to achieve more effective extraction of keywords.Finally,the paper uses the crawled network text data to carry out corresponding contrast experiments on the proposed two models,and verifies the validity and superiority of the proposed model.The specific innovations are as follows:(1)Based on the internal information of a single document,the similarity of the candidate keywords in the document is calculated on the Wikipedia's Word2 vec model,and the candidate words are clustered by the clustering method,and the clustering result is used as a node of the word graph,construct a fully connected network map to sort the node.This method reduces the redundancy of candidate keywords to a certain extent,and improves the topic coverage and extraction accuracy of keywords.(2)Comprehensively use topic model and document structure information,model multiple documents through topic model,change the weight of word graph nodes and random jump probability,solve the problem of limited single document information,and improve the precision and recall of keyword extraction.

Keywords/Search Tags:

keyword extraction, TextRank, LDA, graph model

PDF Full Text Request

Related items

1	Complex Text Keyword Mining Method Based On Graph Embedding Model
2	Automatic Abstract Extraction Based On Keyword And Graph Model
3	Research And Implementation Of News Keyword Extraction Method Based On Semantic Clustering And Weighted TextRank
4	Chinese Text Keyword Extraction Algorithm Based On Graph And LDA
5	Keywords Extraction Based On Word2Vec And TextRank
6	The Research Of Keyword Extraction Technology In Multi-Document
7	Research On The Optimization Of TextRank Keyword Extraction Algorithm And SOM Text Clustering Model
8	TextRank Keyword And Summarization Extraction Algorithm Based On Rough Data-deduction
9	TextRank Algorithm Optimization Based On Markov Model
10	Research On Online Review Oriented Keyword Extraction And Knowledge Association