Font Size: a A A

Analysis Of Internet Hot Topics Based On Key Phrases

Posted on:2019-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2438330545987973Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hot topic analysis is a key technology for text data classification,topic extraction,network hot spot tracking and other processing.It has a wide range of applications in text categorization,search engines,news recommendation and other fields.The key phrase is a word string composed of N-grammatical words.The semantic context is relatively complete,with relatively perfect text features,which can express the theme information of the text more clearly.Therefore,the research on the key phrases extraction method helps to improve the practicality of thematic analysis.The SegPhrase algorithm is the latest technique for extracting key phrases,and the results of extracting key phrases have higher accuracy and recall than traditional methods.Through our analysis and practical application.SegPhrase algorithm has some defects that need improvement.SegPhrase algorithm generates the phrase candidate set based on the keyword statistics only during the process of extracting the key phrases.In the evaluation of the phrase quality,the difference of the influence of different features on the phrase importance is not fully considered;this method also can not support well Key words phrase extraction in Chinese texts.In order to make thematic analysis of Chinese documents better,this paper improves SegPhrase algorithm.In the process of generating phrase candidate sets.this paper can preserve some low frequency but key phrases by using the mutual information features between word strings.and improve the shortcoming of extracting candidate phrases by frequency only.In the process of evaluating the phrase quality.This paper makes use of the differences of different features of phrases,classifies the text features by OOB error method.and gives different features with different weights to synthetically evaluate phrases to make them more in line with the practical application of phrases Context.In addition,in view of the lack of context and lack of context in keyword-based thematic analysis,this paper proposes a hot topic analysis method based on key phrases.By using grammar,semantics and relatively abundant phrases Hot topic analysis of the text.The experimental data in this paper is from the major portals in China continued to crawl a month's document.Experiments show that the improved SegPhrase algorithm has a higher recall and accuracy than the original method,and the thematic analysis based on the key phrase is more capable of expressing the current hot topic of the network clearly and accurately than the topic-based topic analysis.
Keywords/Search Tags:network hot topic, text feature, mutual information, key phrase extraction
PDF Full Text Request
Related items