Font Size: a A A

Based On The Search Logs Of User Behavior Research And Applications

Posted on:2009-12-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:H T ChenFull Text:PDF
GTID:1118360245469617Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The world wide application of search engine can be a milestone of the evolution of Internet. More and more people are willing to choose search engine as the primary tool for hunting resources from Internet. But the performance of search engine is not always satisfactory. Most time search engine will return thousands of related web pages when we input a query. Unfortunately, few pages are useful and few people would like to view the results over 3 result pages. Besides, as for the difference of knowledge background, there are quite some people who can not create the high quality queries that express the search intent clearly. Data mining on query log might be helpful to solve these problems. This paper focused on the analysis and application of query log, including the search behavior model, query classification, search result ranking optimization, and anomaly search detection. The main contributions of this paper are as following:(1) From the base analysis of query log, including the attributes of a search and the relations among them, the searching characteristics of the users from a large scale Chinese search engine were described. The Chinese words segmentation algorithm was introduced to the query log mining, making the analysis result more exactly. The relation between the depth of a web page URL and the visited frequency was discovered, as well as the search variation trends by comparing the query logs recorded in different periods.(2) The search behavior was classified from abstract and specific perspective. A search behavior model was presented, which was crucial for the search behavior related research. The search history was also taken into account, and the impact factor of each query words was calculated.(3) A query classification algorithm was described which can classify the queries into some predefined taxonomies. The algorithm was based on Naive Bayesian network, making the classification reflecting the users' search intent. The misclassification caused by the few word number of an input query and the multi-meanings of a word can be reduced. The classification accuracy can be improved by using the query history.(4) An optimization of the result page ranking algorithm was presented. The algorithm utilized a hybrid frequent pattern tree to restore the queries, and optimized the sorting of first n original search results. This can ensure the relativity and the coverage of the search result, as well as reflecting the search intent of the search engine users.(5) The anomaly search behavior which can be auxiliary for malicious intending was described. An anomaly search type definition was described including the content based anomaly search and the traffic based anomaly search. An anomaly search detection framework was proposed and an optimized decision tree algorithm was utilized to detect the anomaly search behavior.
Keywords/Search Tags:Search Engine, Data Mining, Query Log, Query Classification, Page Sorting, Search Behavior Model, Anomaly Search Detection
PDF Full Text Request
Related items