Font Size: a A A

Comparing top-k algorithms in summary-based XML retrieval

Posted on:2008-03-20Degree:M.ScType:Thesis
University:University of Toronto (Canada)Candidate:Gu, XinFull Text:PDF
GTID:2448390005968150Subject:Computer Science
Abstract/Summary:
The semi-structural feature of XML raises the challenge in XML Information Retrieval (XML-IR)---how to efficiently exploit the additional structural information. In this thesis, strategies for XML-IR are studied, with particular emphasis on strategies for top-k computations. Structural summaries are exploited for efficiently evaluating structural constraints in the query. For top-k computations, both a straightforward Merge Algorithm (MA) and a baseline Threshold Algorithm ( TA) are implemented. In order to obtain the pre-computed scores for the relevant elements that are required as input by the top-k computation algorithms, an Exhaustive Retrieval Algorithm (ERA) is implemented. Experiments are conducted with an XML-IR system named TReX using NEXI queries over IEEE Journal and Wikipedia XML collections. The running times of the ERA, TA and MA algorithms are reported and compared. The experimental results show that neither TA nor MA outperforms the other for all queries.
Keywords/Search Tags:XML, Top-k, Algorithms
Related items