Font Size: a A A

A Study Of Large Scale Search Log Mining Based Context-Aware Search

Posted on:2010-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H H CaoFull Text:PDF
GTID:1118360302971447Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recent years, search engine has been becoming the major tool of retrieving information for Web users. However, well understanding users' information need is still a problem since user queries are usually short and ambiguous. Context-aware search technology is a novel technology for improving search. Here "context" specifies to session context. This technology is on the basis of a common observation that the queries and clicked URLs in the same search session are usually related to each other. In this thesis, we make an organized study of using session context for understanding users' information need better, and thus for enhancing multiple search services.Firstly, we propose a novel context-aware query suggestion approach. Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods suggest queries by mining query patterns from search logs, none of them are context-aware - they do not take into account the immediately preceding queries as context in query suggestions. Our approach is in two steps. In the offline model-learning step, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. Then, from session data a concept sequence suffix tree is constructed as a query suggestion model. In the online query suggestion step, a user's search context is captured by mapping the query sequence submitted by the user to a sequence of concepts. By looking up the context in the concept sequence suffix tree, we suggest to the user context-aware queries. We test our approach on large-scale search logs of a commercial search engine containing 1.8 billion search queries, 2.6 billion clicks, and 840 million query sessions. The experimental results clearly show that our approach outperforms two baseline methods in both coverage and quality of suggestions.Secondly, we propose an approach to context-aware classification. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries without considering their context information. However, many Web queries are short and ambiguous, whose real meanings are uncertain without the context information. We incorporate context information into the problem of query classification by using conditional random field (CRF) models. We perform extensive experiments on real world search logs and validate the effectiveness and efficiency of our approach. We show that we can improve the F1 score by 52% as compared to other state-of-the-art baselines.Last, we propose an approach to context-aware ranking. Ranking is one of the core technologies of search engines. A context-aware approach to ranking may improve users' search experience substantially. To capture contexts of queries, we learn a variable length Hidden Markov Model (vlHMM) from search sessions extracted from log data. Although the mathematical model is intuitive, how to learn a large vlHMM with millions of states from hundreds of millions of search sessions poses a grand challenge. We develop a strategy for parameter initialization in vlHMM learning which can greatly reduce the number of parameters to be estimated in practice. We also devise a method for distributed vlHMM learning under the map-reduce model. We test our approach on a real data set and evaluate the effectiveness of the vlHMM learned from the real data on three search applications: document re-ranking, query suggestion, and URL recommendation. The experimental results show that our approach is both effective and efficient.
Keywords/Search Tags:Session context, search, query suggestion, document ranking, query classification, search log mining
PDF Full Text Request
Related items