Font Size: a A A

Research And Implementation Of News Real-time Topic Analysis System

Posted on:2020-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiuFull Text:PDF
GTID:2428330575987080Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet and news technology,big data technology is gradually heating up,and people are in an era of data explosion.The Internet has the characteristics of rapid dissemination,comprehensive information transmission,wide communication channels,large media,and no geographical restrictions.It has gradually become the main medium of news communication,and the information obtained by the public through the Internet is also increasing.Faced with the explosive growth of network data,network users are eager to efficiently discover news topic events that are of interest to them and are widely concerned by the public.This paper studies network hot topic discovery technology to help network users find hot news in the massive network news.First of all,this paper is based on the characteristics of major news websites and the research on the principles and techniques of various mainstream news topics.Based on the design of Spark platform and realized a news topic discovery system.The system functions mainly include news content crawling,topic generation and management,topic heat assessment and hot topic keyword extraction.The system stores the news data crawled from the news website into the distributed file system HDFS.Based on the Spark RDD memory computing characteristics,a parallelized K-Means algorithm is designed to provide efficient data clustering analysis.News topic generation service.In order to facilitate the user to quickly understand the main content of the topic,the system provides a concise expression of the news topic by extracting the news keywords under the hot topic.Key words extraction of hot topics adopts FP-growth-based frequent news feature word mining algorithm,and uses TF-IDF to calculate frequent feature word weights to extract hot topic keywords.In order to test the effectiveness of the algorithm,we designed a number of experiments,from the aspects of clustering efficiency and clustering quality.The experimental results show that the execution efficiency of the improved algorithm is obviously improved,and the clustering quality is also improved to some extent.Itproves that the system can basically meet the needs of the news topic generation system in terms of functions and performance.
Keywords/Search Tags:News topic, Spark, Text clustering, Keyword extraction
PDF Full Text Request
Related items