Font Size: a A A

Improvement Of Textrank Algorithm Based On Basic-level Category To Chinese Keyword Extraction

Posted on:2018-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X G XiaoFull Text:PDF
GTID:2335330518977270Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The automatic extraction of keywords is the basic technology of text categorization,information retrieval and automatic summarization, and it is of great significance to practical applications. Based on the basic-level category theory, this paper proposes an improvement scheme for the keyword extraction algorithm of TextRank, and evaluates its extraction effect. The full text is divided into five parts.The first part is the preface, which explains the background and significance of the topic selection. The paper summarizes the present research situation of keyword extraction, and gives a brief introduction to the basic-level category and language network, and explains the source of the article.The second part introduces the rationality of improving the TextRank algorithm by using the basic-level category theory, and gives the improved scheme of the algorithm.The core of the improved algorithm is to construct the hierarchical thesaurus based on the basic-level category words. Each word in the lexicon corresponds to an attribute sets,which contains hierarchical information, semantic relationships, and basis weights.The third part describes the method and process of lexicon construction in detail.The construction of the thesaurus mainly includes the selection of the basic-level category words and the determination of the basic weight of the words.The fourth part evaluates the improved algorithm. Three types of texts are selected as scientific research papers, web news and weibo as evaluation materials. Keywords are extracted by TextRank algorithm before and after the improvement. The experimental results show that the improved algorithm is better than the pre-modified algorithm in precision, recall and F1-measure.The fifth part is the conclusion. This part summarizes the main content of the article,and briefly discusses the direction of the improvement of the algorithm.
Keywords/Search Tags:extraction, TextRank, Basic-level category
PDF Full Text Request
Related items