Font Size: a A A

Addressing the Challenges of Underspecification in Web Search

Posted on:2011-12-24Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Welch, Michael JasonFull Text:PDF
GTID:1448390002458494Subject:Web Studies
Abstract/Summary:
The World Wide Web contains information on a scale far beyond the capacity of manual organization methods. Web search engines help users sift through that information to find data of interest through keyword searches, while also driving a multi-billion dollar advertising industry on the Web. Searching through all of the data on the Web to find the most relevant content is an enormous task, often exacerbated by underspecification and ambiguity in the queries posed by users or the underlying data itself. Users frequently omit relevant context or submit multifaceted queries, authors rarely provide explicit keywords or categorizations, and content is often missing relevant keywords. Uncertainty leads to inherent difficulty for search engines to find the best information for a particular user and query.;We investigate these problems and propose techniques to effectively satisfy the needs of users and advertisers when a search engine encounters such uncertainty. The main challenges we address consist of: (1) Discovering which queries or keywords may benefit from contextualization. We propose a framework for automatically identifying geo-localizable queries, establishing several features measurable from search query logs which enable traditional machine learning algorithms to classify queries with high accuracy. (2) Given an ambiguous query, determining the most likely user requirements for each of the possible subtopics and then selecting a diverse set of pages to satisfy the greatest number of users. We describe a model for user satisfaction with a returned set of pages and propose a greedy algorithm for diversifying search results tailored towards the requirements of informational queries, when users frequently require more than one relevant result. We demonstrate notable improvement over current ranking strategies. (3) Identifying the pertinent keywords from sparse or imprecise content. We study two approaches for generating keywords from the text content of videos and investigate related term mining approaches to overcome potential mismatches between these keywords and the keywords chosen by searchers or advertisers. We perform extensive evaluations to highlight under what conditions each method generates the most relevant keywords.;This dissertation presents and evaluates methods and algorithms which may benefit search engines, their users, and their advertising partners for a significant fraction of search instances and exabytes of data.
Keywords/Search Tags:Search, Web, Users, Keywords, Data
Related items