Font Size: a A A

Research And Design Of Real-time Recommendation System Based On Flow Computing

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2428330611967543Subject:Control engineering
Abstract/Summary:PDF Full Text Request
The overload of data gave birth to the recommendation system.With the rapid development of the Internet,traditional offline recommendation is increasingly difficult to meet user needs,and users have higher and higher requirements for the real-time nature of the recommendation system.The current research on recommendation systems at home and abroad is mainly to update the recommendation system model based on offline data,and call the recommendation results obtained by offline update in real time.This article implements a Storm-based real-time recommendation system based on the offline recommendation system.Designed a data collection module based on Flume,a data cache module based on Kafka,an offline computing module based on collaborative filtering and improved integrated algorithms,a real-time computing module based on Storm,and a data storage module.Flume is a highly available and distributed mass log collection system.Flume is compatible with multiple data sources and can output collected data to multiple In external storage systems,log collection can be achieved through simple configuration of Flume;and distributed Kafka message queues can solve data blockage caused by real-time data being too large.Kafka's producer-consumer model is compared to message subscription publishing This mode is easier to implement data caching.In the offline calculation phase,by comparing the recall and ranking algorithms of the recommendation system,a collaborative filtering algorithm and an improved ensemble learning algorithm were selected.In the recommended recall phase,it determines the upper limit of the recall,so it uses the most commonly used item-based collaborative filtering algorithm in the industry.Considering the magnitude of offline data,the similarity calculation of offline items is implemented based on Hadoop Map Reduce,that is,the similarity of items is incrementally updated by calculating the co-occurrence matrix of items.The integrated learning or stacking algorithmmainly includes two layers.The first layer is the GBDT algorithm based on Boosting and the WD algorithm based on deep learning.The idea of the Boosting method is that in actual machine learning tasks,learning a good model is difficult,and learning many weak classification models is relatively simple.The GBDT algorithm is a combination of Gradient boosting and Decision Tree.The single learner in Gradient boosting is Decision Tree.The WD model guarantees the generalization ability of the recommendation system and the memory ability of the recommendation system through the joint training of the LR model and the DNN model.The Stacking model uses the output of the GBDT and WD models as the input of the second-level model LR to make the final recommendation.In the offline phase,the similarity matrix of the items calculated offline and the final recommendation result are saved to Redis for invocation of real-time calculation.In the real-time phase,by comparing the current streaming computing framework,a pure streaming computing model Storm was selected.During the real-time recommendation phase,Storm will save user behaviors through a time window,and user behaviors beyond the window's specified time will not be calculated.The update rule updates the item similarity in real time and updates it in real time.The offline computing framework combined with the real-time computing framework guarantees the accuracy and real-time performance of recommendations.Finally,the feasibility of the design is proved by comparing the recall rate,accuracy rate,recommendation time,and AUC of each recommendation algorithm.
Keywords/Search Tags:Storm, Collaborative filtering, Integrated learning, Time window, Live Update
PDF Full Text Request
Related items