An Automatic Extraction Method For Chinese Article Keywords Based On TextRank And Similarity Of Word Items

Posted on:2022-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Li

Full Text:PDF

GTID:2568306326974799

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Since the beginning of the 21st century,with the development of information technology and the popularity of mobile terminals,the Internet has been generating massive data every minute and every second.At the same time,the problem of information overloading has become increasingly prominent.In particular,the extraction and filtering of a large amount of unstructured text data poses a great challenge to scholars and engineers.Automatic keyword extraction technology is an efficient solution for text data extraction and filtering.It has been widely used in information retrieval,search engine,natural language processing and other fields,and is an important starting point to achieve accurate matching between users and information.TextRank algorithm is one of the most commonly used techniques for automatic keyword extraction,and its essence is an undirected and unweighted graph model.Because of its lightweight body and good performance,TextRank algorithm is highly focused.The traditional TextRank algorithm uses the co-occurrence feature of text word items to construct the topology structure of the graph,and its model effect still can be improved.Some scholars have improved the performance of TextRank algorithm by adding more complex advanced features or integrating more text information.This paper presented the Sim-TextRank algorithm,which is a kind of TextRank algorithm presented in this paper on the basis of further joined the use of word vector Word2Vec builded vocabulary semantic similarity and topic model LDA builded vocabulary topic similarity.Through the scientific paper and news corpora,the corpus of the experimental results show that join the information from the topics and semantic similarity TextRank helped to raise the accuracy of algorithm of capturing keywords.Also this paper introduced suitable hyper parameters for the Sim-TextRank algorithm,compared the semantic similarity and the topic similarity.

Keywords/Search Tags:

Keyword extraction, TextRank, Semantic similarity, Topic similarity

PDF Full Text Request

Related items

1	Research On Keyword Extraction Algorithm For Chinese Text Based On Document Topic Structure And Semantics
2	Keyword Extraction From News Web Pages
3	Research And Implementation Of News Keyword Extraction Method Based On Semantic Clustering And Weighted TextRank
4	Research On Topic Modeling Method Based On Semantic Distribution Similarity
5	Research On The Calculation Method For Semantic Similarity Of Sentence And Its Application
6	Mongolian Short Text Semantic Similarity Calculation Based On Deep VAE Integrated With Topic Information
7	The Research On Topic Extraction From Web Pages Based On Semantic
8	Research On Mobile Application Information Recommendation Algorithm Based On User Similarity And Topic Similarity
9	Research On Traffic Terminology Similarity Matchment Based On Topic Vertical Search Engine
10	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods