Font Size: a A A

The Design And Implementation Of Real-Time News Recommendation System Based On Spark Streaming

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:B CuiFull Text:PDF
GTID:2428330542496641Subject:Software engineering
Abstract/Summary:PDF Full Text Request
According to statistics,with the global trend of continuous growth in the shipment of smartphones.China's smartphone shipments ranks first in the world,including the tablet devices,has reached 1.3 billion.The increase in smartphone shipments has also promoted the development of the mobile Internet,which has exceeded the limitation of time and space in the production and transmission of information,so that the amount of data generated daily on the Internet has reached about 6 EB.Faced with the massive amount of data on the Internet,it is often difficult for users to find what they are really interested in at the moment.Therefore,how to achieve real-time processing of big data and accurate access to online data has become a problem that needs to be solved.Based on the mixed architecture of Spark Streaming stream data processing framework and Hadoop batch processing framework,this paper addresses the real-time problem of big data processing,builds a personalized news recommendation system,and recommends filtered information to users to solve users' difficulties in obtaining information.The Hadoop batch processing framework is responsible for off-line computing,regularly processing user behavior logs for a period of time,and using the offline user model generated to assist Spark Streaming online calculations to generate an accurate real-time user model.The collaborative filtering recommendation algorithm and the content-based recommendation algorithm are selected for use in the recommendation system,so that the high-quality news on the Internet can be accurately recommended to the required users,and the accuracy of the news recommendation is improved.According to the function division,this recommendation system implements a data collection module,a recommendation calculation module,and a storage module.The data collection module obtains the user historybehavior logs and news data through the kafka message queue,and sends the message queue to the model building module for user models and news.The construction of the model.The recommended computing module mainly includes user model construction,news model construction,recommendation engine and distribution system.The model building module mainly completes the initial construction and post-update function of the model.The recommendation engine completes the application of the news recommendation algorithm in this recommendation system,pushing up the news.The accuracy of the recommendation,the distribution system completes the filtering,sorting,and other functions of the recommendation algorithm recall news,and there are differences in the processing strategies of the distribution system for different recommendation algorithms.The storage module mainly uses MolaDB,Redis,and HBase to store user data and news data.Among them,the Baidu self-research storage platform MolaDB mainly stores user models and news models,the Redis storage distribution module recalls clicks and display information of news,and HBase stores users.Full information on behavior logs and crawl news.For new users,because there is no history log of user history,user interest points cannot be mined.At this time,the recommendation system uses hot news recommendation technology to recommend hot news for users.For the old user,the system uses the user's historical behavior log to build a user model,and recalls news according to multiple recommendation algorithms applied to the system.The distribution module uses the corresponding computing strategy to filter and sort the recalled news according to the recommendation algorithm.Finally,the result processing module assigns different weight parameters to the recall news of each queue,fuses news of each recommended queue after the empowerment to generate a recommendation list,and select topN news to be returned to the user.Finally,functional test cases are designed for each function module and recommendation algorithm of the recommendation system,verifying that the functional modules of the recommendation system implemented in this paper meet the expected functional requirements.Through the design of performance test cases,the stability of the system under high traffic conditions and the real-time performance of news recommendation are tested,which shows that the system has high stability and scalability in the actual environment.
Keywords/Search Tags:News recommendation system, Distribution module, Spark Streaming, Model construction
PDF Full Text Request
Related items