Font Size: a A A

The Research Of Web Link Structure Analytical Algorithm Based On Mapreduce

Posted on:2015-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YinFull Text:PDF
GTID:2268330428982476Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet information technology,Web page in-formation is showed a trend of exponential growth. In the face of such a huge Web information resource,search engine has become an important tool for users to acquire the network information resources.Web link structure analysis algorithm,an im-portant part of Internet search engine,Used to evaluate the potential importance of Web page information and assist users to query information quality by analyzing the links between Web page structure,greatly improving the quality of the user query information.However,as the increasing amount of computation,Web link structure analysis algorithm of traditional centralized architecture,Problems of it can be found not only in the bottleneck of computing and storage aspects mathemat-ics but also in stability and expansibility of the system.In recent years, the Ha-doop,as a distributed platform of handle huge amounts of data,with its reliability, high efficiency,high scalability and other advantages, has become a hot academic research.This paper studies deeply the classic web link structure analytical algo-rithm,PageRank algorithm,HITS algorithm,the theory knowledge of the Hadoop dis-tributed platform and MapReduce programming framework,to Combining Web link structure algorithm and Hadoop platform,And do the following work:1. In the Hadoop platforms,in consideration of Multiple access HDFS in each iteration of PageRank lead to the large consumption of I/O,and the problem of large quantities of dealing with the Key in a mixed phases and ordering in graphs operation at a time,algorithm efficiency is low.This paper presents the method based block structure division by converting web link relations between the network rela-tionship between blocks,in order to greatly reduce the iteration algorithm to deal with the Key in the process of the amount of data,To decrease the number of calls for the Map and Reduce operations, reduces the overhead caused by I/O transmission, improve the efficiency of algorithm implementation.2. In the Hadoop distributed,researching on matrix storage about link structure of the traditional HITS algorithm,and studying the low efficiency of the normaliza-tion processing.According to the Hadoop platform features,this text redesigns the HITS algorithm based on graphs,changes the storage node information, breaks through the traditional HITS algorithm Hub values and the coupling of the public Authority value multifarious relations, to improve HITS algorithm.
Keywords/Search Tags:Web link structure analysis, Hadoop distributed platform, MapReduce, PageRankalgorithm, HITS algorithm
PDF Full Text Request
Related items