Font Size: a A A

Search Engine Optimization Method Based On Pagerank

Posted on:2013-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q B GuoFull Text:PDF
GTID:2248330371969300Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the internet, the amount of information is increasingfast, which on one hand increases the amount of information in the internet, on the other handmakes users accessing to information become more and more difficult. The emergence of thesearch engine technology provides a convenient channel for people to get what they need frominternet. At present, people have already been very familiar with the search engine and more ofthem will firstly access to search engine home page when they want to surf the Internet.Therefore, the search engines continue to explore how to provide users with better service. Theperformance and quality of the search engine is reflected by the user satisfaction and the userwill prefer to choose the top pages of the search results when they search through a search engine,so sorting the search engine’s search results reasonably will significantly improves the quality ofthe search engines. There are two most important algorithms to sort the search results of thesearch engine, which is PageRank algorithm and HITS algorithm. PageRank algorithm iscalculated off-line, so its performance is higher than HITS algorithm, which leads PageRankalgorithm to be more common in actual use. Traditional PageRank algorithm ignores a numberof factors that may affect the important degrees of the web page in the process of calculation, soit has many defects. In order to avoid the defects of the PageRank algorithm, the paper proposesthree improvements on the traditional algorithm.Firstly, because the traditional PageRank algorithm judges the importance of the pages onlythrough the links, while ignores the correlation between the web content, the traditionalPageRank algorithm will lead to topic drift. The paper calculates the correlation between pagecontent through the vector space model, then converts the result into the relative weight. In theimproved algorithm, the weight of the topic of web content will decide how to allocate webauthoritative value which means that the webpage will get greater authority if it is more relevantto its incoming link.Secondly, the webpage’s authoritative value calculated according to the traditionalPageRank algorithm is proportional to the number of citations. For the new page, because of itsshort online time, its number of times quoted by other web is very less, which leads to itslow-ranking. In order to make the important new pages upgrade the order faster, the paperproposes an improved time weighted feedback method.Thirdly, the search engine in operation will record a large number of user behaviorinformation which reflects the tendency of the user’s searching. A reasonable use of this information can help to improve the quality of the search engine. This paper counts andprocesses the user’s click behavior, which is used to indicate the user’s vote to the page. The finalsorting results reflect the subjective choice behavior of the page if the user’s vote is integratedinto the sorting algorithm.In order to optimize the PageRank algorithm which can improve the performance of thesearch engine, this paper combines to consider the topic relevance, the time feedback and theuser feedback so that these three factors can affect the page’s authoritative allocation. Theimproved algorithm is called Multing-PageRank algorithm.In the experimental section of this paper, the open source search engine Nutch is used tocrawling on the network, and then the query results of Nutch are sorted respectively based on theimproved algorithm and the traditional algorithm. Through the analysis to the sorting result andquery test, the experiment verifies the sorting result calculated by improved algorithm is affectedby the user’s subjective behavior and the precision has improved. In particular, the queryprecision of the new page is significantly higher than the traditional algorithm.
Keywords/Search Tags:Search Engine, PageRank, Topic Relevance, Time Feedback, User Feedback, Multi-PageRank
PDF Full Text Request
Related items