Font Size: a A A

Research On Processing Long Queries In Information Retrieval

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y R WangFull Text:PDF
GTID:2428330596496922Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the widespread application of the Internet and the rapid expansion of the networked data,users are increasingly relying on search engines to search and obtain the information they need.To fully express query requirements,users sometimes input long queries described in natural language into search engines for retrieval.Long queries are a difficult but increasingly important segment for web search engines.On the one hand,compared to short queries,long queries can express more complex information needs;On the other hand,short queries are easier for search engines to process than long queries.It's mainly because search engines have difficulty in accurately distinguishing between key and complementary concepts in long queries,and the retrieval results cannot be focused on the query topic.How to design effective processing algorithms to improve retrieval performance of long queries is the main object of this research.The main work of this thesis is as follows:(1)Based on word embedding,an algorithm for reformulating long queries is proposed.In this algorithm,a reformulation tree framework based on word embedding is developed to organize multiple sequences of reformulated queries as a tree structure,where each path of the tree corresponds to a sequence of reformulated queries.Specifically,the algorithm generates an n-level reformulation tree containing three query operations,i.e.,query reduction,query substitution and query expansion.Furthermore,a weight estimation approach based on word embedding is proposed to assign weights to each node of the reformulation by taking the relationship between different nodes in the word vector space into full consideration.Finally,the selected top-ranked nodes will be integrated into the original long query and searched in the search engines.Experiments on TREC collections showed that compared to directly using original long queries,the queries generated by this algorithm have considerable improvement in both MAP and P@N evaluation metrics.(2)We propose a supervised machine learning algorithm for key concept identification and weighting to process long queries.Multiple features of concept candidates(including query-dependent,corpus-dependent,and corpus-independent features)are selected as training data for automatic extraction of key concepts from long queries,and certain weights are assigned to concept candidates.Finally,a probabilistic model is proposed to incorporate the original long query and key concept information into a single structured query for retrieval.Experimental results showed that this algorithm improved retrieval effectiveness of long queries described in natural language derived from query topics on TREC' s Web collections.
Keywords/Search Tags:long queries, word embedding, query reformulation, concept identification, weight assignment
PDF Full Text Request
Related items