Font Size: a A A

Research On Web Search Result Clustering With Wikipedia Disambiguation Pages

Posted on:2016-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z HuangFull Text:PDF
GTID:2308330476954974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of web, "information overload" becomes an increasingly serious problem and how to retrieve the Web information has become a new challenge for information retrieving. Search engines can help users to search the entire web with a query word. However, for the query is usually a single word or short phrase, query itself may be ambiguous and have various topics. Clustering the search results into different topics can improve efficiency of Web search of ambiguous queries.This paper studies the current development of Web search result clustering and related technologies, and proposes a novel method to cluster Web search results with Wikipedia disambiguation pages. Given an ambiguous query, the method clusters the corresponding search results by two steps: 1) Constructing clustering structure from Wikipedia disambiguation pages, namely topics and concepts of the query. To get a collection of related concepts for each topic, this paper proposes a concept filtering algorithm to filter noisy concepts by their similarities between the corresponding topic and query. 2) Assigning search results to relevant topics. This paper proposes TKFR(Top K Full Relations) algorithm to do the assigning job. The algorithm can compute the similarity between each topic and search result by just few words or concepts.The results show that the proposed method improves the clustering result of WSD(Word sense disambiguation) based methods, and works for various length of ambiguous query. For the topics of WSD based methods are edited by human and can form a taxonomy, our method can provide predictive topic labels and improve the efficiency of information search for ambiguous queries.
Keywords/Search Tags:ambiguous query, Web search result clustering, Wikipedia disambiguation pages, concepts filtering
PDF Full Text Request
Related items