Font Size: a A A

Research On Chinese Information Retrieval

Posted on:2009-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2178360245957964Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the development of the Internet and Information Technology, web retrieval technology is playing an indispensable role in people's life and study. How to acquire useful information efficiently has become the major content in web retrieval technology. Two aspects of difficulty we are in face of in information retrieval are, namely, how users accurately express query requests and effectively interact with the information retrieval system, as well as the consequence of the system itself ordering the documents. This paper comprehensively analyzes the two aspects and shows a combination of them via optimizing the query and the document.Studies in the thesis mainly include:(1) This paper analysis the effectiveness of several categories of query expansion using term expansion or term re-weighted and proposes a novel method by using web-based resources for user query expansion. In this method, we download pages from the Internet, then analyze the pages and extract relevant terms Group to expand the user query. Compared with the traditional query expansion method using pre-construct static thesauri, our method can automatically construct the semantic resources according to web information. Our method has less constraint but higher efficiency.(2) We want to consider the context of the whole document collection and propose a document expansion based on clustering. First, we retrieve documents by traditional method and get the similarity between query and document; second, we cluster the top-n documents, and calculate the similarities between query and clusters; last, we combine the two similarities as the final similarities between query and every document, and re-ranking the result set.(3) The combination of query expansion based on related term group and document expansion based on clustering is proposed in this paper. Experiments on NTCIR-5 and NTCIR-6 CLIR set show that our method achieves a certain improvement comparing with the traditional method.
Keywords/Search Tags:information retrieve, query expansion, document expansion, related term group
PDF Full Text Request
Related items