The Role of Document Structure and Citation Analysis in Literature Information Retrieval

Posted on:2016-04-20

Degree:Ph.D

Type:Thesis

University:Drexel University

Candidate:Zhao, Haozhen

Full Text:PDF

GTID:2478390017983865

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

Literature Information Retrieval (IR) is the task of searching relevant publications given a particular information need expressed as a set of queries. With the staggering growth of scientific literature, it is critical to design effective retrieval solutions to facilitate efficient access to them. We hypothesize that particular genre specific characteristics of scientific literature such as metadata and citations are potentially helpful for enhancing scientific literature search. We conducted systematic and extensive IR experiments on open information retrieval test collections to investigate their roles in enhancing literature information retrieval effectiveness.;This thesis consists of three major parts of studies. First, we examined the role of document structure in literature search through comprehensive studies on the retrieval effectiveness of a set of structure-aware retrieval models on ad hoc scientific literature search tasks. Second, under the language modeling retrieval framework, we studied exploiting citation and co-citation analysis results as sources of evidence for enhancing literature search. Specifically, we examined relevant document distribution patterns over partitioned clusters of document citation and co-citation graphs; we examined seven ways of modeling document prior probabilities of being relevant based on document citation and co-citation analysis; we studied the effectiveness of boosting retrieved documents with scores of their neighborhood documents in terms co-citation counts, co-citation similarities and Howard White's pennant scores. Third, we combined both structured retrieval features and citation related features in developing machine learned retrieval models for literatures search and assessed the effectiveness of learning to rank algorithms and various literature-specific features.;Our major findings are as follows. State-of-the-art structure-ware retrieval models though reportedly perform well in known item finding tasks do not significantly outperform non-fielded baseline retrieval models in ad hoc literature information retrieval. Though relevant document distributions over citation and co-citation network graph partitions reveal favorable pattern, citation and co-citation analysis results on the current iSearch test collection only modestly improve retrieval effectiveness. However, priors derived from co-citation analysis outperform that derived from citation analysis, and pennant score for document expansion outperforms raw co-citation count or cosine similarity of co-citation counts. Our learning to rank experiments show that in a heterogeneous collection setting, citation related features can significantly outperform baselines.

Keywords/Search Tags:

Retrieval, Literature, Citation, Document, Search, Relevant, Features

PDF Full Text Request

Related items

1	Citation Context Based Analysis Technologies On Scientific Literature Retrieval
2	Research On Citation Recommendation Of Academic Literature By Fusing Structure Function And Citation Function
3	Study On Retrieval Results Re-ranking And Literature Authority Computing
4	The performance of cited references as an approach to information retrieval
5	Keyword-driven Citation Recommendations
6	Research On The Sorting Algorithm Of Scientific Literature Retrieval Based On TF-IDF
7	Analysis Of Literature Evaluation Index Based On Citation Network Dataset
8	Ancient Document Retrieval System Of Chinese Medicine
9	Research On Document Identification Of "Sleeping Beauty" In The Field Of Library And Information Science
10	Part-of-speech Effect And Affect In Search That In Chinese Literature Of Science And Technology