Font Size: a A A

Query And Analysis In Information Retrieval

Posted on:2011-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2208360308966504Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In order to convey more sophistical information in information retrieval (IR), many long queries are used by people to describe their special needs. However, most of the commercial and academic search engines cannot handle these long queries well. Query Segmentation is essential to query processing. It aims to tokenize query words into several semantic segments and help the search engine to improve the precision of retrieval. In this paper, we present a novel unsupervised learning approach to Query Segmentation based on principal eigenspace similarity of query word-frequency matrix derived from web statistics. Experimental results show that our approach could achieve superior performance of 35.8% and 17.7% in F-measure over the two baselines respectively, i.e. MI (Mutual Information) approach and EM optimization approach.However, Query Segmentation does not identify which segments are key-phrases, and does not assign explicit weights to segments. In this paper, we develop and evaluate a novel unsupervised approach for key phrase extraction from long natural language queries. In particular, we first employ a web statistics-based affinity matrix W to identify the relations between query words. We then cluster the query words into k clusters in its spectral space, and select the higher ranked clusters as query key-phrases. In particular, we propose a new method to automatically determine the number of clusters based on the distribution of eigenvalues of W. Our experiments demonstrate that our approach could achieve significant improvements in key-phrase extraction as compared to the state-of- the-arts methods, including the noun phrases extraction, the TFIDF weighting based extraction and the enhanced k-means extraction methods.Besides, we also give the details of development of Query Segmentation. We analyze the conditions of the Forward Maximum Matching (FMM) algorithm in segmentation and combine it into our applications. This approach is not only improving the speed in the Query Segmentation, and also handling some special long queries well in segmentation. In the last, we present the applications of query key-phrases in Query Suggestion and Directory Search.
Keywords/Search Tags:Information Retrieval, Query Segmentation, Key-phrase Extraction
PDF Full Text Request
Related items