Font Size: a A A

Research On Pseudo Relevance Feedback Query Expansion Technology Based On Latent Semantic Relation

Posted on:2020-12-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:M PanFull Text:PDF
GTID:1368330578976510Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Pseudo relevance feedback(PRF)via query expansion can effectively improve the performance of information retrieval.Query expansion is necessary in the process of PRF.Most of the traditional PRF methods rank and select terms by calculating their importance in pseudo relevance documents.In fact,candidate extension term does not contain the potential semantic relationship between term and query term when the importance of the candidate extension term is calculated by the frequency of term and reverse document Based on the analysis of the current research situation of PRF,we propose several methods for selecting candidate extended terms with different semantic features in combination with the classical PRF framework.The focus is on the semantic capture and representation of extensions and the fusion and improvement of extended term weights.These methods can capture more information related to the query topic and optimize the retrieval effect.The main research work and innovation points include the following aspects:(1)This dissertation proposes a PRF model named HRoc based on term hyperspatial semantics.The model uses an improved algorithm of hyperspace simulation language to measure latent semantic relations between the query and candidate extended term.Then,the weights of terms are added to the traditional Rocchio model for query expansion.At the same time,the potential semantic relationship between the candidate term and the original query term is considered in the process of query expansion.Furthermore,three different normalization strategies based on HRoc model are proposed.They are used to coordinate the weights of candidate extended terms generated by different features.Finally,we introduce an adaptive parameter to replace value of HRoc model,and it can select window size according to document length.Based on 2016 TREC Clinical Support medicine dataset,experimental results demonstrate that the proposed HRoc model outperform other state-of-the-art models,such as PRoc2 and TF-PRF methods on various evaluation metrics.The proposed HRoc model can effectively enhance the precision and the recall rate of information retrieval and get a more precise result than other models.Finally,the HRoc model,which introduces adaptive parameters,has fewer hyperparameters to achieve equivalent performance than other models.It improves the efficiency and applicability of the model and helps users retrieve documents more efficiently.(2)Pseudo-relevance feedback is a well-studied technique of query expansion in which it is assumed that the top-ranked documents in an initial set of retrieval results are relevant and expansion terms are extracted from those documents.When selecting expansion terms,most traditional models do not simultaneously consider term frequency and the co-occurrence semantics relationships between candidate terms and query terms.However,a term that has a higher co-occurrence with a query term is more likely to be related to the query topic.In this dissertation,we propose a kernel co-occurrence-based framework to enhance retrieval performance by integrating term co-occurrence semantics information into the Rocchio model and a relevance language model(RM3).Specifically,a kernel co-occurrence semantics based Rocchio method(named KRoc)and a kernel co-occurrence semantics based RM3 method(named KRM3)are proposed.In our framework,co-occurrence semantics information is incorporated into both the factor of the term discrimination power and the factor of the within-document term weight to boost retrieval performance.The experimental results show that our proposed methods significantly outperform the corresponding strong baselines over all datasets in terms of the mean average precision(MAP)and over most datasets in terms of P@10.A direct comparison on standard Text Retrieval Conference(TREC)datasets indicates that our proposed methods are at least comparable to state-of-the-art approaches.(3)Actually,a retrieval model that ignores the semantic information of a query sentence is often difficult to accurately discriminate the correct meaning of a polysemous word in a query topic.This will misinterpret the user's true intent and result in a poor retrieval effect.In order to distinguish the user's query real intentions and improve the semantic understanding ability of information retrieval system,we explore the method of sentence semantic similarity based on deep learning.The semantic similarity information of the sentences which contain the terms and the query is used as the weights of extension term.The weights are introduced into the classical Rocchio model,and a pseudo-correlation feedback model based on the latent semantics of the BERT model is proposed and named BRoc.The results on the standard TREC datasets demonstrate that the BRoc model is feasible.And it can extract the semantic characteristics of sentences between query and document effectively.Especially,it has a great ability to distinguish the polysemy of terms and enhances the performance of the traditional PRF.
Keywords/Search Tags:Information retrieval, Pseudo relevance feedback, Query expansion, Latent semantic
PDF Full Text Request
Related items