Font Size: a A A

Research On Hot Spot Detection Of Network Public Opinion Based On Hadoop Platform

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2428330623956685Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the "fourth media" after newspapers,radio and television,the Internet has become one of the important windows for the public to express their views and convey their voices.Compared with traditional media,the network has natural advantages in communication,which makes public opinions spread "virally" on the Internet once network hot spots or issues are generated.In this context,the collection,processing and analysis of online news data to obtain information related to online public opinions has become one of the hot research topics in many fields such as natural language and data mining.The research focus of this paper is to use incremental clustering algorithm to detect hot topics of public opinion in the network space and extract keywords from the paper comprehensively by combining the semantic information and statistical information of words.In text modeling,in order to better distinguish the news events described by different news documents,this paper increases the weight of specific part of speech and word length on the basis of the traditional TF-IDF algorithm.In view of the dynamic and temporal nature of online news reports,this paper divides the hotspot detection task into multiple time segments in chronological order,and obtains the final hot topic through the initial detection and topic merge.The improved single-pass clustering algorithm uses the centroid vector to calculate the similarity,so as to reduce the computation in the clustering process.In addition,in view of the deficiency of traditional keyword extraction technology in the application of network public opinion,this paper,on the basis of keyword extraction based on the theme model,adds statistical features to comprehensively screen out the keywords that can best express the content of network news.The topic of this paper is based on the Hadoop platform for network public opinion hotspot detection.HDFS,MapRedcue parallel computing framework and other big data processing technologies are used in data collection,preprocessing,hot spot detection(cluster analysis)and other parts,effectively solving the deficiency of traditional technologies in processing massive text data.
Keywords/Search Tags:Hotspot detection, Text modeling, Single-pass algorithm, Keyword Extraction, Hadoop
PDF Full Text Request
Related items