Font Size: a A A

Research Of Result Optimization Of Information Retrieval

Posted on:2008-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:T XuFull Text:PDF
GTID:2178360215456801Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Information Retrieval (IR) is an important issue in Chinese Information Processing. It closely related to Term Extraction, Word Sense Disambiguation. Syntax Analysis and so on. It also has been applied in many other fields of Natural Language Processing, such as Machine Question-Answer, Automatic Text Summarization and Statistical Machine Translation. In addition, information Retrieval has played a key role in the real life recently. Widely use of search engine is some good evidence. To meet user's requirement and overcome the negative effects from Chinese variety, Chinese fuzziness and other factors, IR optimization developed into the key point and push of this field.IR optimization involves Retrieval Model, Word Segmentation, Query Expansion and Document Re-ranking. After comparing the effects caused by different optimal element, the paper focused on Document Re-ranking and gave an in-depth study.First, analyze different optimal method and compare several optimal elements. The result provided basic data for the further Chinese information retrieval research.Second, propose a topic word pair based re-ranking strategy to give a workable scheme and some thoughts for IR Optimization. Analyze characteristics of topic word pairs and get them adopting Probabilistic latent semantic analysis. Then, the distribution of the word pairs is used to re-rank documents.Third, design a system according to our strategy.Finally, we test our method. Results show a 76.0% and 58.8% improvement compare to the initial retrieval without any re-ranking or query expansion on NTCIR-5 document collection for SLIR. It's better than pseudo-relevance feedback. We also found that the selection method of topic word pair is not unique. It doesn't rely on specific arithmetic.
Keywords/Search Tags:Information Retrieval, Topic Word Pair, PLSI, Document Re-ranking
PDF Full Text Request
Related items