Research On Keyword Extraction Algorithm For Chinese Text Based On Document Topic Structure And Semantics

Posted on:2018-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:Z T Xu

Full Text:PDF

GTID:2428330512495914

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Into the twenty-first century,with the continuous progress of technology and the rapid development of the Internet,various types of information resources doubled rapidly.People are eager to be able to quickly and accurately from a huge source of information to find own really useful information.Keywords can highly induce the content of the document and reflect the theme of the document,it will provide a powerful help for people to find resources.Most of the current text resources do not provide keywords.Although manual tagging keywords often have a higher accuracy,it tends to has strong subjectivity because of the differences in the knowledge reserves,the differences of understanding degree and summary ability.Moreover,it takes more time to read and understand the text,which obviously can't meet the rapid growth of information resources today.Keyword extraction technology emerges,which can handle this problem well.Establishing a unified standard,with the help of the computer's fast processing power,automatically extract keywords,which can greatly reduce the human and time consumption and reduce the impact of subjectivity.In this dissertation,the keyword extraction for Chinese text as the research objects.The basic concept of keyword extraction is expounded,and the research on the research situation at home and abroad is carried out.Then,the method based on the document topic structure and the method based on semantic are studied in detail.This dissertation analyzes the differences between the Chinese word segmentation and the English word segmentation,and the former is more complicated and has a greater impact on keyword extraction.Aiming at the difficult problem of new word recognition in Chinese word segmentation,this dissertation dynamically updates the word segmentation dictionary to improve the accuracy of Chinese word segmentation.At the same time,with the help of vector space model,the improved algorithm is used to find the optimal clustering in the continuous text segment,and the topic structure of the article is constructed.The algorithm based on the topic structure of the document is improved to extract the global keywords.On the basis of this,adding the semantic similarity between the Chinese words to further improve the algorithm.Combine the statistical methods with semantics to improve the effect of keyword extraction.In this dissertation,the accuracy rate,recall rate and F metric are taken as the evaluation indexes,and the experimental results of the improved algorithm and other algorithms indicate that the improved algorithm can improve the result of keyword extraction for Chinese text,and the effectiveness of the improved algorithm is verified.

Keywords/Search Tags:

Extraction, Topic Structure, Semantic Similarity

PDF Full Text Request

Related items

1	Research On Topic Modeling Method Based On Semantic Distribution Similarity
2	The Research On Topic Extraction From Web Pages Based On Semantic
3	Mongolian Short Text Semantic Similarity Calculation Based On Deep VAE Integrated With Topic Information
4	Research On The Calculation Method For Semantic Similarity Of Sentence And Its Application
5	Research On Hot Topic Detection Technology Of Netnews
6	Research On Semantic Similarity Measure Method For RDF Graphs
7	Research Of Text Extraction Algorithm Based On Visual Semantic Block
8	Calculating Phenotypic Similarity Between Genes Using Hierarchical Structure Data Based On Semantic Similarity
9	Research On BBS Topic Detection And Tracking
10	Hot Topic Extraction From Microblogs