Font Size: a A A

An Improved Relation-Based Information Retrieval Technology

Posted on:2008-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178360242999045Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
One of the limitations with the traditional relationship-based IR methods is that a relation is often recorded as a binary form,such as R(First Term,Second Term),which is only composed of general information of a pair of two terms which are semantically and syntactically related to each other.To tackle this problem,we explore an improved technique by using of triples in information retrieval for precision-focused biomedical literature search.In this paper,a triple is defined as a data structure for the integration of a pair of concepts as well as a verb phrase or sometimes a special noun we extract from the sentence as the relation of the above concepts pair, and stores relation and concepts information.Unlike the traditional relationship-based model,our model represents a document or a query by a set of triples,such as R(relation)[First Concept, Second Concept].Since some semantic and syntactic exceptions occur in documents and queries, the different types of triple should be permitted,e.g.a query:"What does the mad cow disease come from?" has a triple:R(come from)[First Concept(mad cow disease),Unknown].Therefore, we can get the "answer" of the unknown thing in query if some documents have the matching triples in the index.Of course,we will applied the advanced ontology-based approach to extract generic concepts and their relations by using both UMLS and WordNet,and we have implemented a new approach to rank retrieved passages from same or different documents corresponding to measuring system performance protocol in TREC 2007 Genomics Track.A new version(we called it IRIRS) of the relation-based IR system(called RIRS) which has been developed by DM & Bioinformatics Lab of Drexel University in 2004(we called it RIRS),is then built for the improved relation-based search in the area of biomedical literature IR and DM. We use IRIRS to improve the retrieval result of tests of English reading comprehension.The experiments which are based on the different collections show more promising performance of IRIRS than RIRS.Mean average passage precision(MAPP),the character-based MAP measuring passage-level retrieval performance,for 64 topics is significantly raised from 64.44% (the result of RIRS) to 74.28%.And we also use this improved relation-based IR system to improve the retrieval result of all official runs in TREC 2004 Genomics Track.The average P@100(the precison of top 100 documents) for 50 topics is significantly raised from 26.73%(TREC),53.69%(RIRS) to 63.93%(the P@100 of the best run of TREC 2004 is 42.10%) while the MAP(Mean Average Precison) is kept at above-average level of 26.59%,which is raised from 21.71%(TREC),37.58%(RIRS) to 40.14%.Furthermore,the experiment shows more expressiveness of relation and triple structure for the representation of information needs, especially in the area of biomedical literature.
Keywords/Search Tags:relationship-based information retrieval, relation extraction, query parsing, triple integration
PDF Full Text Request
Related items