Font Size: a A A

Research And Implementation Of Distributed Real-Time Recommendation Algorithm Based On Data Stream

Posted on:2019-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y H CongFull Text:PDF
GTID:2348330545955621Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of information amount in the Internet era,users find it difficult to get interested information from the vast amount of data.The recommendation system can provide users with personalized recommendations,mining valuable information from massive data,resulting in high academic value and commercial value.Collaborative filtering algorithm based on matrix factorization is a widely used algorithm and is one of the mainstream methods in recommendation system.This method has the following technical difficulties and needs further research and optimization:traditional matrix factorization is often based on batch method.The cost of updating the model is relatively large and the training data are all generated before training,resulting in that the recommendation result can not rapidly response according to user's preference,and the algorithm has poor efficiency.The solution method of matrix decomposition,such as classical stochastic gradient descent,is essentially an iterative-convergent algorithm.Recent works rely parameter server in distributed environment to implement the algorithm.However,the parameter server solution will lead to the problem of delayed-gradient and "straggler" during training,which cannot guarantee the convergence of loss function and the cost of parameter synchronization brings large impact on training efficiency.In view of the above problems,this paper proposes a method of matrix decomposition in distributed data stream environment using stochastic gradient descent algorithm and peer-to-peer parameter exchange.It has the following innovations:using iterative data stream and peer-to-peer parameter exchange instead of parameter server to effectively reduce the communication and parameter synchronization overhead in distributed model training process and solve the problem of delayed-gradient;Largely reduce the parameter-exchange times by reasonably increasing the parameters' exchange step size in model training process and efficiently reduce the impact of "straggler" problem;Through the introduction of forgetting strategy and anomaly detection capabilities,the algorithm has the ability to respond to user interest drift,and the prediction accuracy has been improved to some extent;Through the redesign of using consistent hash algorithm on data set and model sharding,the system can adapt to data stream environment and dynamic changes of cluster's computing ability.In this paper,the improved algorithm is implemented on Flink and the experimental results show that the algorithm can meet the requirements in distributed data stream environment,and can efficiently reduce the communication overhead while ensuring the accuracy of prediction.
Keywords/Search Tags:matrix factorization, stream computing, distributed collaborative filtering, peer-to-peer parameter exchange, interest drift
PDF Full Text Request
Related items