Research And Implementation Of Distributed Real-Time Recommendation Algorithm Based On Data Stream

Posted on:2019-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Cong

Full Text:PDF

GTID:2348330545955621

Subject:Computer Science and Technology

Abstract/Summary:

With the explosive growth of information amount in the Internet era,users find it difficult to get interested information from the vast amount of data.The recommendation system can provide users with personalized recommendations,mining valuable information from massive data,resulting in high academic value and commercial value.Collaborative filtering algorithm based on matrix factorization is a widely used algorithm and is one of the mainstream methods in recommendation system.This method has the following technical difficulties and needs further research and optimization:traditional matrix factorization is often based on batch method.The cost of updating the model is relatively large and the training data are all generated before training,resulting in that the recommendation result can not rapidly response according to user’s preference,and the algorithm has poor efficiency.The solution method of matrix decomposition,such as classical stochastic gradient descent,is essentially an iterative-convergent algorithm.Recent works rely parameter server in distributed environment to implement the algorithm.However,the parameter server solution will lead to the problem of delayed-gradient and "straggler" during training,which cannot guarantee the convergence of loss function and the cost of parameter synchronization brings large impact on training efficiency.In view of the above problems,this paper proposes a method of matrix decomposition in distributed data stream environment using stochastic gradient descent algorithm and peer-to-peer parameter exchange.It has the following innovations:using iterative data stream and peer-to-peer parameter exchange instead of parameter server to effectively reduce the communication and parameter synchronization overhead in distributed model training process and solve the problem of delayed-gradient;Largely reduce the parameter-exchange times by reasonably increasing the parameters’ exchange step size in model training process and efficiently reduce the impact of "straggler" problem;Through the introduction of forgetting strategy and anomaly detection capabilities,the algorithm has the ability to respond to user interest drift,and the prediction accuracy has been improved to some extent;Through the redesign of using consistent hash algorithm on data set and model sharding,the system can adapt to data stream environment and dynamic changes of cluster’s computing ability.In this paper,the improved algorithm is implemented on Flink and the experimental results show that the algorithm can meet the requirements in distributed data stream environment,and can efficiently reduce the communication overhead while ensuring the accuracy of prediction.

Keywords/Search Tags:

matrix factorization, stream computing, distributed collaborative filtering, peer-to-peer parameter exchange, interest drift

Related items

1	Research On Interest-based Search Mechanism In Unstructed Peer-to-Peer System
2	Application Of Peer-to-Peer On Service-Oriented Distributed Computing System
3	Research On Some Key Soft Security Problems Of Peer-To-Peer Systems
4	P2p Collaborative Communication Model And Derivative To The Grid Collaborative Communication Protocols
5	Community Models In Peer-to-Peer Networks And Their Applications In Search
6	Research And Improvement Of Recommendation Algorithm Based On Clustering And Matrix Factorization
7	A Study On Large Scale Peer-to-Peer Search And Applications
8	Research On Security Control Mechanisms Of Peer-to-Peer Cloud Storage Service Systems
9	Research On Collaborative Filtering Recommendation Algorithm Based On Improved Clustering And Matrix Factorization
10	Design And Implement A Peer-to-Peer Media Streaming Live System