Font Size: a A A

Faceted Search Over Large-scale Heterogeneous Web

Posted on:2013-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:F W ZhuFull Text:PDF
GTID:1228330395989255Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the Web becomes increasingly popular and diverse, we have witnessed an explosive growth of Web information. While such large amount of data published and shared on the Web provides us a various and enormous information repository, the scale and diversity of Web data also increases the difficulty in Web information retrieval. Therefore, how to develop an effective and efficient Web search approach becomes a practical research topic.Combining free-text search and faceted navigation, faceted search guides a user’s search by providing valid query refinements iteratively so that the user can flexibly navigate through the result set without feeling lost or reaching a "dead end". However, the successful development of faceted search approach requires a well-defined faceted classification to organize and classify the dataset, and thus it is difficult to apply faceted search technique to the largest corpus:the Web. In addition, the heterogeneous and unstructured characteristics of Web data also make faceted search problem more challenging.To tackle the challenges in faceted web search, in this paper, we thoroughly study the main issues in faceted Web search and integrate the proposed techniques to generate a systematic solution to faceted Web search, including the data model, search paradigm and the ranking algorithms. We also develop a faceted search prototype, FacetedWeb, over the real Web corpus.We first propose to leverage the named entities in the Web to model the unstructured Web data as a structured entity tuple database so that an entity tuple database could be built conceptually to support fine-grained faceted search. We integrate the real web uncertainties such as entity extraction uncertainty and tuple construction uncertainty to develop a minimum-cost faceted search framework. Then, we analyze the main techniques in implementing faceted search system over the large-scale Web corpus, and propose correspondent search model and ranking algorithms, as follows: 1) Expansible search paradigm:Considering the drawbacks of traditional refined search paradigm, i.e., search within search, we propose a new expansible search paradigm to capture the changing user intent in heterogeneous Web search and improve the efficiency of the search system. The proposed search paradigm aims at returning the most promising results, which appear in the neighborhood of the query node, to efficiently construct an initial result set, and dynamically re-search the data set according the change of user intent, by filtering uninterested results as well as expanding new relevant answers.2) Fast entity ranking algorithm:To improve the efficiency of entity ranking, we propose FastPPV, an approximate PPV computation method that is incremental and accuracy-aware. The computation is partitioned and will be scheduled for processing in an "organized" way, such that we can gradually improve our PPV approximation and quantify the accuracy of our estimation at query time. We also develop a hub based solution to efficiently partition and prioritize computation so that the shared sub-structure between different tour partitions can be reused to speed up computation.3) Dynamic facet ranking algorithm:To ensure that the iterative faceted search is query-intent aware and the initially missing entities can be retrieved if needed in subsequent faceted search, we propose a dynamic facet ranking approach for iterative faceted search. Our approach re-ranks the facets by their relevance w.r.t. both the initial query and the iteratively chosen facets. We propose a hybrid model which combines the local relevance and the global relevance in computation to achieve the best performance.We conduct comprehensive experiments on FacetedWeb, and the results validate the effectiveness and efficiency of the proposed techniques and algorithms.
Keywords/Search Tags:faceted entity search, large-scale Web search, faceted metadata, expansible search paradigm, efficient PPV algorithm, dynamic facet ranking
PDF Full Text Request
Related items