Font Size: a A A

Research On Web Structure Mining Algorithm In Cloud Computing

Posted on:2016-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2308330464471634Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the rapid development of network, people can publish and access a number of information on the Internet frequently and easily. The information from web pages has been the main data source. The amount of information has brought about the challenge of data analysis and mining. The Web structure mining technology can discover the potential information hidden in Web by studying the link relationship between Web pages. With the increasing amount of information, how to improve the web structure mining technology performance has become the subject of extensive research.Cloud computing technology provides a solution, through building clusters to obtain a strong computing and storage capacity. The cluster can be deployed on the common cheap computer, and can be realized by parallel processing. So, it can well implement Web structure mining in cloud computing environment.There overviews knowledge of Web mining, Web structure mining and cloud computing. Then, there introduces the classical Web structure mining algorithm—PageRank and a cloud computing platform—Hadoop. On this basis, the following is:(1) Analysis of the realization of PageRank parallel computing method:the inner product method, the outer product method and block matrix method. To study how to implement the PageRank parallel algorithm through Hadoop platform, the MapReduce framework and the traditional matrix partition block.(2) The Gauss-Seidel method can reduce the number of interations. Using that advantage, here replaces the power iteration which is used by the original PageRank algorithm to improve the calculation.(3) Matrix blocking is a common method for improving the calculation efficiency of PageRank, but the blocking rules are hard to determine, and the following calculating is complicated. For the objective of accessing a good performance, there puts forward an improved method which is on MapReduce and the minimum blocking to implement PageRank. As a result, the calculation is simple and cut down the consumption of the I/O transmission to improve the performance.(4) Finally, a Hadoop platform is built for using different scales and sparseness matrix to compare the performance between the traditional blocking method and the improved one. These results display that the proposed method really has better calculation efficiency.
Keywords/Search Tags:Cloud computing, Hadoop, The minimum blocking, PageRank, Web structure mining
PDF Full Text Request
Related items