Font Size: a A A

Research Of Chinese Information Retrieval System And Document Reranking

Posted on:2011-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:F FangFull Text:PDF
GTID:2178360308977212Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the improvement of computer system performance, the rapid development of Internet information, as well as the degree of enterprise informatization, the Chinese information resources get a fast rate of increase. The increases of information meet the information needs of people and also lead to the difficulty for the fast, accurate search requirement at the same time. In this case, the information retrieval technology becomes a research hotspot.Information Retrieval usually refers to text information retrieval, including information storage, organization, performance, query, access and other aspects, and the core of it is the text indexing and retrieval. The main technique about information retrieval system includes the index processing, query expansion, retrieval model, document reranking and so on. For Chinese information retrieval,the word segment technique is also very important.The studies about the Chinese information retrieval of this paper can be divided into two parts. Firstly, taking the NTCIR7 Chinese IR4QA subtask as the experimental background, we complete the design and implementation of a Chinese information retrieval system. The index function component segments the original documents into words and then generates an inverted index with word units. The retrieval component applies the classical vector space model. In order to solve the problem of word mismatch, a query expansion method based on the local co-occurrence is employed for attaining more useful key words and generating a new query after obtaining the initial search results. The experimental results show that this query expansion strategy improves the system performance significantly. And evaluated by the NITCIR7 official tool, we can also see that our system owns a relatively good performance. Secondly, we do research on document reranking technique about the specific types of questions. When the retrieval system returns the results to the users, the users may be used to just browse the top N documents. In view of this kind of phenomenon, we try to improve the precision of the top results by document reranking. This paper notices the characteristics about the open resource Wikipedia and the definition as well as the biography type of questions. We make use of the Wikipedia pages related to the specific questions for document reranking. Experiments show that our method can improve the precision of the top results efficiently.
Keywords/Search Tags:Information Retrieval, Inverted Index, Vector Space Model, Query Expansion, Document Reranking
PDF Full Text Request
Related items