Font Size: a A A

Exploiting Key Issues On Temporal Web Information Retrieval

Posted on:2012-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiFull Text:PDF
GTID:2178330338992029Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Time is an important dimension of information space. It plays important roles in Web search and Web clustering, because most Web pages contain time information and many Web queries are time-related. Therefore, time-related Web search, or in other words temporal Web search has been the research focus in Web information retrieval. However, traditional search engines have little consideration on the time information in Web pages, especially on time-aware ranking and clustering of Web pages. As a consequence, integrating time semantics into Web search will not only improve the effectivenss of search results, but also advance the research on Web information retrieval.Currently, exploiting temporal information in Web pages has been a hotspot in the research on Web search. First, for the ranking of Web pages, traditional approaches in Web search engines adopt a ranking method based on the text relevancy of Web pages to the given query. Text retrieval, though, treats temporal expressions as common terms, thus ignoring their inherent relationship. Second, for the clustering of Web pages, traditional clustering algorithms are usually based on the common phrases of Web pages, but have little consideration about using the temporal information of Web pages at the same time, and this cannot satisfy the need of users. So if we take full advantage of the temporal information in the ranking and clustering of Web pages, the performance of search engines can be improved greatly.In this paper, we study some key issues on temporal information retrieval. In particular, we focus on the ranking and topic clustering for Web pages based on temporal information. The main contributions of this paper can be summarized as follows:(1) We design a mapping algorithm of for each Web page, which is based on the analysis on the association of keywords with content time in Web pages. This algorithm maps every keyword of Web pages into a corresponding content time period. For the implicit time of Web pages, this algorithm can find its reference time using backtracking algorithm and then change it into an explicit time period. This mapping algorithm is the basis of the ranking algorithms afterwards.(2) We propose two ranking algorithms, CT-Rank and NTLM, that both consider the temporal and text relevance of Web pages with the query. The CT-Rank algorithm is an experience-based ranking algorithm, which consists of an offline and an online stage. In the offline stage, we compute the time-constrained tf-idf value for each keyword using the set of pairs. In the online stage, which refers to the query processing stage, the algorithm uses three factors of a Web page, namely the Pagerank value, the title ranking score, and the time-constrained keyword ranking score, to sort search results. The NTLM algorithm is a ranking algorithm based on temporal language model. This algorithm integrates the temporal information of Web pages into language model, and ranks Web pages according with the probability of the text part of user's query and the temporal part of user's which is induced by the keyword and temporal information of Web pages. The experimental results show that the two ranking algorithms are better than competitor algorithms.(3) We propose a topical temporal clustering algorithm for News Web pages. This algorithm improves traditional algorithms which only consider text clustering, and uses temporal clustering technology for every cluster offered by traditional clustering algorithms, that is to arrange every Web page of a cluster into a timeline. The experimental results show that, based on the clustering results, users are easy to find the time evolvement of news and can qucikly find their favorite news topics.
Keywords/Search Tags:information retrieval, temporal Web page ranking, topical temporal clustering
PDF Full Text Request
Related items