Research On Processing Long Queries In Information Retrieval

Posted on:2020-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y R Wang

Full Text:PDF

GTID:2428330596496922

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the widespread application of the Internet and the rapid expansion of the networked data,users are increasingly relying on search engines to search and obtain the information they need.To fully express query requirements,users sometimes input long queries described in natural language into search engines for retrieval.Long queries are a difficult but increasingly important segment for web search engines.On the one hand,compared to short queries,long queries can express more complex information needs;On the other hand,short queries are easier for search engines to process than long queries.It's mainly because search engines have difficulty in accurately distinguishing between key and complementary concepts in long queries,and the retrieval results cannot be focused on the query topic.How to design effective processing algorithms to improve retrieval performance of long queries is the main object of this research.The main work of this thesis is as follows:(1)Based on word embedding,an algorithm for reformulating long queries is proposed.In this algorithm,a reformulation tree framework based on word embedding is developed to organize multiple sequences of reformulated queries as a tree structure,where each path of the tree corresponds to a sequence of reformulated queries.Specifically,the algorithm generates an n-level reformulation tree containing three query operations,i.e.,query reduction,query substitution and query expansion.Furthermore,a weight estimation approach based on word embedding is proposed to assign weights to each node of the reformulation by taking the relationship between different nodes in the word vector space into full consideration.Finally,the selected top-ranked nodes will be integrated into the original long query and searched in the search engines.Experiments on TREC collections showed that compared to directly using original long queries,the queries generated by this algorithm have considerable improvement in both MAP and P@N evaluation metrics.(2)We propose a supervised machine learning algorithm for key concept identification and weighting to process long queries.Multiple features of concept candidates(including query-dependent,corpus-dependent,and corpus-independent features)are selected as training data for automatic extraction of key concepts from long queries,and certain weights are assigned to concept candidates.Finally,a probabilistic model is proposed to incorporate the original long query and key concept information into a single structured query for retrieval.Experimental results showed that this algorithm improved retrieval effectiveness of long queries described in natural language derived from query topics on TREC' s Web collections.

Keywords/Search Tags:

long queries, word embedding, query reformulation, concept identification, weight assignment

PDF Full Text Request

Related items

1	Research Online Expansion Method Of Long Tail Queries For Search Advertisement
2	Modeling reformulation as query distributions
3	A classification approach to the automatic reformulation of Boolean queries in information retrieval
4	Automated Query Reformulation Approach For Document Search In Software Engineering
5	Research On Long Text Matching Based On Concept Interaction Graph
6	XML query reformulation over mixed and redundant storage
7	Research On The Representation Of Word Embedding Based On Knowledge Fusion
8	Research On Query Reformulation Based On Machine Learning
9	Research On Query Reformulation For Medical Data Search
10	Query Optimization Based On Word Embeddings Model