Font Size: a A A

Research On Entity-based Information Retrieval Models

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q X GongFull Text:PDF
GTID:2428330605461315Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional information retrieval approaches usually represent text by using the bag-of-words model.The bag-of-words model has disadvantages in representing text.The terms in the text contain complex semantic information such as polysemy,semantic relevant and synonymy,but the bag-of-words model cannot capture the semantic information.Entities in the knowledge base(e.g.Wikipedia)contain rich semantic information.Therefore,many researchers have tried to use the entities in the knowledge base to model the text and propose many entity-based information retrieval models.In this paper,we respectively propose different entity-based retrieval models and pseudo-relevance feedback models under the language modeling framework.Compared with the existing entity-based retrieval models,the approaches proposed in this paper have two advantages:(1)they utilize the entity-linking tool TagMe to extract the entities from queries and documents.TagMe outperforms the existing entity-linking tools in terms of accuracy and efficiency;(2)they utilize both entities and terms under the language modeling framework and consider the impact of relative importance of terms and entities.The main research work in this paper is as follows:1)We propose an entity-based language modeling retrieval framework,which utilizes entity information to improve the performance of information retrieval under the language modeling framework.We utilize term and entity to improve the retrieval performance at the language model level and the retrieval ranking score level.In this paper,we propose four entity-based retrieval models:TSE,TAE,TS-TSE and TS-TAE.The experiments on AP90,AP,DISK1-2,DISK4-5(-CR),WT2G and WT10G can prove the effectiveness and feasibility of the proposed retrieval models:TSE,TAE,TS-TSE and TS-TAE.2)We apply the entity-based language model to the pseudo-relevance feedback approaches.When calculating the importance of candidate terms in the feedback document,the traditional pseudo-relevance feedback approaches consider the term frequency and inverse document frequency and ignore the complex semantic information of the terms in the text.Therefore,the candidate expansion terms may not be semantic related to the original query.In this paper,we propose an entity-based pseudo-relevance feedback framework,which incorporates entity information into the relevance model RM3 to improve the retrieval performance.In this paper,we propose two entity-based pseudo-relevance feedback retrieval models:RM-TAE and RM-TAE-TAE.The experiments on AP90,AP,DISK1-2,DISK4-5(-CR),WT2G and WT10G can prove the effectiveness and feasibility of the proposed retrieval models:RM-TAE and RM-TAE-TAE.
Keywords/Search Tags:information retrieval, entity, language model, pseudo-relevance feedback
PDF Full Text Request
Related items