Font Size: a A A

Based On User Behavior Log Analysis Of Search Engine Ranking Algorithm

Posted on:2012-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhanFull Text:PDF
GTID:2178330332995934Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently, search engines already as the primary tool of access to network resources by users. The Ideal search engine for users should be able to according query terms of different users to providing search information which relevant with user's interest, then search engines need to take the user behavior information into account. Consider user interest and targeted information retrieval, is an important issue.This paper presents an improved algorithm for N-PageRank. Through the search log of digging, log information using, user behavioral characteristics analyzing, the algorithm combined classical PageRank model and user behavior feedback model, and established an improved sorting model. It summarizes and describes all kinds of surface phenomenon, reveals user search intention, finding out the user's interest and search rule, thus improve the accuracy of ranking results and ensure the results returned is that users want to see. Experiment results show that the algorithm can effectively reduce the objective factors influence during page sorting, give full consideration to the evaluation of the user for the website quality, the sort results obtained are more able to meet the users needs.The paper completed the major work are as follows:(1) The paper uses N-PageRank algorithm, according to the user's access frequency of web page hits and user behaviors, it uses the reasonable data model, and considers the proportion of user behavior in page rank, and finally the paper calculates the comprehensive weight, and then gives the sort results which associate with the user behavior. The feedback model of user behavior is the focus of this paper, it is mainly based on five areas:①the text similarity of web pages which exists link relationships;②the factors of the user behaviors;③the browsing time vector when user visit web pages;④traditional PageRank value;⑤Web implied relevancy which is posed by the user click data.(2) Making simulation of search engine data acquisition, storage, analysis and output etc, verified and compared the difference of using PageRank algorithm with the improved algorithm N - using PageRank. We use MATLAB urlread function to construct network reptile, crawled the news channel of NetEase163 24 hours, and obtained 2000 web pages. We analyzed similarity between experimental data and large-scale search engine log data. Experimental data has been proved the same comprehensive, reflecting the masses of user's interest trend. The following is conclusion:①Sorted results are highly relevant to user's interests and behavior.②Click data of users in a conversation is limited, only commonly will click 1 or 2 result pages.③Ranking results from improved formula are closer to the actual needs of users, much better than the search engine results.④Optimized search results from improved algorithm are closer to user intent, web popularity directly affects the level of page rank in returned results.
Keywords/Search Tags:Search engine, User log, Data mining, User behavior feedback model, Web implied relevancy
PDF Full Text Request
Related items