Query And Analysis In Information Retrieval

Posted on:2011-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2208360308966504

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

In order to convey more sophistical information in information retrieval (IR), many long queries are used by people to describe their special needs. However, most of the commercial and academic search engines cannot handle these long queries well. Query Segmentation is essential to query processing. It aims to tokenize query words into several semantic segments and help the search engine to improve the precision of retrieval. In this paper, we present a novel unsupervised learning approach to Query Segmentation based on principal eigenspace similarity of query word-frequency matrix derived from web statistics. Experimental results show that our approach could achieve superior performance of 35.8% and 17.7% in F-measure over the two baselines respectively, i.e. MI (Mutual Information) approach and EM optimization approach.However, Query Segmentation does not identify which segments are key-phrases, and does not assign explicit weights to segments. In this paper, we develop and evaluate a novel unsupervised approach for key phrase extraction from long natural language queries. In particular, we first employ a web statistics-based affinity matrix W to identify the relations between query words. We then cluster the query words into k clusters in its spectral space, and select the higher ranked clusters as query key-phrases. In particular, we propose a new method to automatically determine the number of clusters based on the distribution of eigenvalues of W. Our experiments demonstrate that our approach could achieve significant improvements in key-phrase extraction as compared to the state-of- the-arts methods, including the noun phrases extraction, the TFIDF weighting based extraction and the enhanced k-means extraction methods.Besides, we also give the details of development of Query Segmentation. We analyze the conditions of the Forward Maximum Matching (FMM) algorithm in segmentation and combine it into our applications. This approach is not only improving the speed in the Query Segmentation, and also handling some special long queries well in segmentation. In the last, we present the applications of query key-phrases in Query Suggestion and Directory Search.

Keywords/Search Tags:

Information Retrieval, Query Segmentation, Key-phrase Extraction

PDF Full Text Request

Related items

1	Spatioteporal-Phrase Based Video Retrieval
2	Research Of Web Biological Information Retrieval And Extraction Technologies Based Ontology
3	Research On Translation Methods Of Query Items In Chinese-Mongolian Cross-Language Information Retrieval
4	Query Expansion Research
5	The Method Of Fine-Grained Topic Information Extraction And Text Clustering Based On Chinese Phrase
6	Algorithm Research For Text Information Retrieval Based On Web
7	The Mdb In The Multimedia Data Representation And Query
8	Fast Retrieval Method For Encyclopaedia Knowledge
9	Chinese Prepositional Phrase Recognition Based On Fine-grained Phrase Information
10	Information Retrieval System Based On Document Query