Font Size: a A A

Context-based publication search paradigm in literature digital libraries

Posted on:2009-09-24Degree:Ph.DType:Thesis
University:Case Western Reserve UniversityCandidate:Ratprasartporn, NattakarnFull Text:PDF
GTID:2448390002491402Subject:Computer Science
Abstract/Summary:
This thesis identifies two problems with the task of searching literature digital libraries: (a) there are no effective paper-scoring and ranking mechanisms. Without a scoring and ranking system, users are often forced to scan a large and diverse set of publications listed as search results and potentially miss the important ones. (b) Topic diffusion is a common problem: publications returned by a keyword-based search query often fall into multiple topic areas, not all of which are of interest to users.; As a response to the problems listed above, this thesis proposes a new literature digital library search paradigm, called context-based search, which effectively ranks search outputs and controls the topic diversity of keyword-based search query outputs. Our approach can be summarized as follows. During pre-querying, publications are classified to pre-specified ontology-based contexts, and query-independent context scores are attached to papers with respect to their assigned contexts. When a query is posed, relevant contexts are selected, search is performed within the selected contexts, context scores of publications are revised into relevancy scores with respect to the query at hand and the context that they are in, and query outputs are ranked within each relevant context. With the context-based search approach, (1) query output topic diversity is minimized, (2) query output size is reduced, (3) user time spent scanning query results is decreased, and (4) query output ranking accuracy is increased.; In addition to keyword-based search, one important feature in searching literature digital libraries is to find "related publications" of a given publication. Existing approaches do not take into account publication topics in the relatedness computation, allowing topic diffusion to permeate across query output publications. In this thesis, we propose a new way to measure "relatedness" by incorporating "contexts" of publications. We define three ways of context-based relatedness, namely, (a) relatedness between two contexts (context-to-context relatedness) by using publications that are assigned to the contexts and the context structures in the context hierarchy, (b) relatedness between a context and a paper (paper-to-context relatedness), which is used to rank the relatedness of contexts with respect to a paper, and (c) relatedness between two papers (paper-to-paper relatedness) by using both paper-to-context and context-to-context relatedness measurements.; Using existing biomedical ontology terms as contexts for genomics-oriented publications, our experiments indicate that the context-based approach is highly accurate and effectively solves the topic diffusion problem across search results.
Keywords/Search Tags:Search, Literature digital, Context, Topic diffusion, Relatedness, Query, Publication
Related items