Font Size: a A A

Research And Implementation Of Multi Domain Hotspots Mining Technology Based On Spark

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X S LiFull Text:PDF
GTID:2428330572973642Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous rapid development of information technology,the speed of information generation and dissemination is increasing day by day.At the same time,the advances in mobile Internet technologies and the large-scale application of smart terminals make it convenient for Internet users to acquire latest information.In this context,the influence of conventional news broadcast platforms such as newspapers and televisions has gradually declined.Newly emerged news boradcast platforms such as portal websites,news websites and social media have increasing impact.However,nowadays,the number of news broadcast platforms is extremely large and that of news reports also grows dramatically.Therefore,online news becomes increasingly messy.For a single user,although he is able to obtain hotspots from the personalized recommendation of the news app,the "tailor-made" news has serious homogenization and restricts his freedom read and choose.Therefore,it is difficult to obtain news hotspots in their areas of interest and track their development trends in the face of the ever-increasing volume of daily news data.Thus,the application of advanced big data technology to process and analyze a huge amount of news information,to show users hot topics in different fields,has important research significance.Based on the above analysis,this paper combines the characteristics of hot topic of online news and the advantages of big data processing platform,and independently designs and implements a multi-domain online hotspots mining system based on Spark,which effectively improves the efficiency of news hotspots mining and tracking.The main tasks completed in this paper include:(1)To solve multi-domain news classification problem,we propose an ensemble fastText news classification model based on hybrid sampling,which enables users to independently select their areas of interest(such as technology,sports,entertainment,etc.)and focus on hotspots;(2)To solve hotspots detection problem,we improve the traditional text feature representation model,by combining NE-LDA and Word2Vec methods.Then the Single-Pass clustering algorithm is used to automatically discover hotspots;(3)To solve hotspots display and tracking problem,we utilize the entropy weight method to comprehensively and objectively evaluate the hotspots from the perspectives of time,media and users,and to recommend hot topics based on the user's location attribute;(4)The multi-domain network hotspots mining system based on Spark is designed and implemented,including data collection and storage,news classification,hotspots mining and display modules,which verifies the proposed methods are feasible and effective.
Keywords/Search Tags:fastText, topic detection and tracking, Single-Pass, Spark
PDF Full Text Request
Related items