Font Size: a A A

Design And Implementation Of News Hotspot Discovery Based On Data Mining

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q TongFull Text:PDF
GTID:2428330572472242Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of Internet technology,network media has become an important channel for people to get news.Because of its fast speed and wide range of news dissemination,network news produces enormous amount of news data every day.Research on data mining strategies of news hotspots has important theoretical and practical value for recommending high-quality and high-value news content.In addition,excellent news hotspot discovery system can also help journalists and government departments to carry out news public opinion tracking and other related work.At present,the research of news hotspot discovery system is still in its infancy.There are few related theoretical algorithms and systems.Based on the above reasons,this paper designs the key algorithm model of news hotspot discovery system and completes a news hotspot discovery system based on data mining algorithm.1)Based on the features of news text and word vectorization algorithm,this paper proposes a text tag vector model with tag vector as the core and designs a formula for calculating the similarity of news text.2)Based on text label vector model,this paper designs Label-Vec density clustering algorithm based on improved DBS CAN algorithm;Label-Vec algorithm uses Hash barrel to segment text space,effectively reduces the number of comparise of core objects in the clustering process and optimizes the performance of the algorithm while reducing the complexity of the algorithm.3)Based on the LDA topic model,this paper designs a hot topic discovery model,which can get the key topics of the cluster through the LDA model under each cluster,and generate hot topics based on this model,and design the corresponding calorific value calculation formula to realize the intuitive display of the hot topic calorific degree of news.In this paper,a series of tests have been carried out on the news hotspot discovery system.Tests show that the system can accomplish the task of mining hotspot topics in mass news texts.In Label-Vec algorithm,compared with traditional Single-Pass and K-Means clustering algorithm,the system has less clustering time-consuming in algorithm complexity and better internal and clustering performance.External performance.
Keywords/Search Tags:MinHash, Document Cluster, LDA
PDF Full Text Request
Related items