Font Size: a A A

A Study Of Some Issues In Chinese Text Information Retrieval

Posted on:2007-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:X H TuFull Text:PDF
GTID:2178360182988954Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet in China, more and more Chinese documents are readily online. The Internet provides important and convenient repositories for reference information, but it is very difficult to find the relevant information form Internet. Information retrieval systems are used to help people to find the information they want.As an example of ideographic languages, Chinese is very different from Indo-European languages. Many approaches, which are appropriate to Indo-European languages, probably are not appropriate to Chinese. This thesis focuses on some important problem in Chinese information retrieval research. Studies in the thesis mainly include:(1) This paper compares the retrieval performance of different word segmentation methods. In our experiment, we adopt manual word segmentation method as a base line for comparison. Different from traditional comparison approaches, we adopt manual word segmentation method to segment the query clause. Our experiment results provide a baseline for the further Chinese information retrieval research..(2) This paper proposes a novel method to improve the performance of Chinese information retrieval systems by expanding queries using automatically acquired related term groups. Unlike traditional query expansion methods, the related term groups extracted from web-based corpuses and the related term extracted from document set are used in combination to improve the effectiveness of query expansion in our method. Experiments show that our method achieves an average significant improvement compare to the traditional relevance feedback technique.(3) This paper proposes the design of a full-text retrieval system and put forward the implementation of the system, in which multi-language retrieval and memory indexstructure are supported. In this paper, the function of three important modules, namelystore layer, language analyzer and core layer, are discussed in detail. Finally, aframework for distributed full-text retrieval system is also presented.(4) All the above studies and approaches have been synthesized, and a text information retrieval experimental system has been designed and implemented.Employing the system we participated in the 5th text retrieval conference (NTCIR'5), which is a famous and important international standard testing and evaluation text information retrieval conference. We get top five in Chinese single retrieval track, which well proved the effectiveness and the feasibility of the studies in this thesis.
Keywords/Search Tags:Chinese query expansion, Chinese full-text retrieval, Chinese indexing technology, and related term group
PDF Full Text Request
Related items