Font Size: a A A

A Research On Personalized PageRank Based On MapReduce

Posted on:2014-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiuFull Text:PDF
GTID:2268330425466233Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the continuous improvement of computer processing capability,Internet technology achieved an unprecedented development. The appearance of networkresearch domain based on Web2.0technologies has mass data gathering in the network everyday, in the meantime, the number of Internet webpage shots up. In the age marked byinformation, huge amounts of data is the precious and important wealth to society. Now as theamount of information on the Internet rates exponentially, junk mail and redundantinformation is on every side and it must consume a large amount of time for people to finduseful information. A lot of redundant information has affected the efficiency of attempting tofind out information,so access to the necessary information fast, convenient and efficient hasbecome a focus of concern for the growing number of users and operators. With thedevelopment of Internet, getting information by retrieving has already deeply absorbed intothe daily life of ordinary people.Firstly,this paper studies the relevant background and theory of personalized PageRankbased on MapReduce,analyses and summarizes the research state of personalized PageRank.On this basis, this paper further study and researches the personalized PageRank algorithmbased on MapReduce, especially focus on the analysis of the bottleneck factors that affect theperformance and effectiveness of the algorithm, namely the number of iterations and the I/Ocost is not optimal, thus This paper presents a new personalized PageRank algorithm based onMapReduce, namely Merging algorithm. Then this paper analysis the algorithm, including thecorrectness, the number of iterations and I/O cost. Through analyzing, we can get the numberof MapReduce iterations used by this algorithm is optimal among a broad family of therandom walk algorithms for the problem, and its I/O efficiency is much better than roundingalgorithm and SQRT algorithm. Finally, this paper, realize personalized PageRank algorithmand its improved algorithm in the MapReduce programming model with data set of SougouQ,then compare and analysis the experimental data obtained. Through experimentation andcomparision, experimental results show that the algorithm proposed are more efficient thanthe current algorithms,which not only has the lowest number of MapReduce iterations, butalso the obtained error is also lower.
Keywords/Search Tags:personalized PageRank, MapReduce, Merging algorithm
PDF Full Text Request
Related items