Font Size: a A A

Research And Application Of The Hot Topic Extraction Technology Based On Clustering Algorithms

Posted on:2017-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LuFull Text:PDF
GTID:2348330503978282Subject:Software engineering
Abstract/Summary:
Many researchers have been paying much attention to the clustering algorithm over the past years, such as structural clustering, distributed clustering, spectral clustering, and result in the birth of hierarchical clustering, DBSCAN, QT,BIRCH, Mean shift clustering etc. These algorithms can cluster texts well. However, when used in the hotspot problem, these algorithms have some limitations, such as in the similarity calculation and keywords extractions.This paper is in the background of the enterprise innovation resource management and analysis platform. It’s provided to extract the hotspot of network news, science and technology papers. First of all, this paper analysis the text writing structure of network news, science and technology papers. In the basic of the analysis, we set the weights of each word. The DBSCAN clustering algorithm is used to cluster the texts. Finally, the hotspot words is extracted from the clusters and exhibition on the web site. The main studies in this paper are as follows:The high or low quality of keywords is closely related to the major points of the articles content. Fully understanding the exact meaning of the article content and text of each word can help to effectively extracted keywords. In order to improve the quality of the extraction of hot words, the author customized to extract feature words from text narrative structure.When clustering the texts, people need to figure out the similarity between two text features, in order reflect the writing structures and the distribution of articles in the article, the author take the text feature weights into account. And the text feature weights is considered into the similarity formula.On the basis of the above work, the author extract the hotspot words from each cluster. The hotspot words’ weights depends on the size of their cluster. And the weights of the hotspot words is used to control their color and the word size.
Keywords/Search Tags:Clustering Algorithms, Public opinion system, Data visualization, Keywords Weight
Related items