Font Size: a A A

The Research On Indexing Strategies For Chinese Information Retrieval

Posted on:2008-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HanFull Text:PDF
GTID:2178360215459838Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increasing information available in the information era, the information retrieval (IR) system becomes more and more important as an indispensable way of accessing the information. Because there is no blank to segment Chinese texts, research on indexing strategies is a peculiar problem in Chinese IR.Segmentation is an indispensable step in Chinese IR. So, this thesis studies segmentation. The thesis analyzes segmentation ambiguity, analyses the current solutions for segmentation ambiguity and data smoothing technologies, and puts forward a solution for out-of-vocabulary problem, which is a key problem in Chinese IR.This thesis implements an IR system for the research on Chinese indexing strategy. For the first, organization, store, search and compression of indexing in the IR system is studied. For the second, IR models are explored. At last, the proper indexing data structure is selected, and the formula BM25 of 2-Possion Model, which achieves good results in previous experiments, is used in this thesis.This thesis deeply studies indexing strategy. For the first, the performance of the indexing strategy based on Chinese word, based on unigram and bigram are compared. For the second, the combination of the different indexing strategies is discussed. At last, the improved bigram indexing strategy is put forward. This effectiveness of the new approach is evaluated on TREC Mandarin corpus by applying BM25 of 2-Possion Model. Experimental results show that the improved bigram indexing strategy is not only relatively effective with mean average precision and R-precision but also better or comparable with the best result in Recall Level Precision Averages and Document Level Averages.
Keywords/Search Tags:Chinese information retrieval, indexing strategy, information retrieval model, segmentation
PDF Full Text Request
Related items