Font Size: a A A

Research On Keyword Extraction Algorithm For Chinese Text Based On Document Topic Structure And Semantics

Posted on:2018-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z T XuFull Text:PDF
GTID:2428330512495914Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Into the twenty-first century,with the continuous progress of technology and the rapid development of the Internet,various types of information resources doubled rapidly.People are eager to be able to quickly and accurately from a huge source of information to find own really useful information.Keywords can highly induce the content of the document and reflect the theme of the document,it will provide a powerful help for people to find resources.Most of the current text resources do not provide keywords.Although manual tagging keywords often have a higher accuracy,it tends to has strong subjectivity because of the differences in the knowledge reserves,the differences of understanding degree and summary ability.Moreover,it takes more time to read and understand the text,which obviously can't meet the rapid growth of information resources today.Keyword extraction technology emerges,which can handle this problem well.Establishing a unified standard,with the help of the computer's fast processing power,automatically extract keywords,which can greatly reduce the human and time consumption and reduce the impact of subjectivity.In this dissertation,the keyword extraction for Chinese text as the research objects.The basic concept of keyword extraction is expounded,and the research on the research situation at home and abroad is carried out.Then,the method based on the document topic structure and the method based on semantic are studied in detail.This dissertation analyzes the differences between the Chinese word segmentation and the English word segmentation,and the former is more complicated and has a greater impact on keyword extraction.Aiming at the difficult problem of new word recognition in Chinese word segmentation,this dissertation dynamically updates the word segmentation dictionary to improve the accuracy of Chinese word segmentation.At the same time,with the help of vector space model,the improved algorithm is used to find the optimal clustering in the continuous text segment,and the topic structure of the article is constructed.The algorithm based on the topic structure of the document is improved to extract the global keywords.On the basis of this,adding the semantic similarity between the Chinese words to further improve the algorithm.Combine the statistical methods with semantics to improve the effect of keyword extraction.In this dissertation,the accuracy rate,recall rate and F metric are taken as the evaluation indexes,and the experimental results of the improved algorithm and other algorithms indicate that the improved algorithm can improve the result of keyword extraction for Chinese text,and the effectiveness of the improved algorithm is verified.
Keywords/Search Tags:Extraction, Topic Structure, Semantic Similarity
PDF Full Text Request
Related items