Font Size: a A A

The Application Of Cross-Language Information Retrieval Based On Latent Semantic Analysis

Posted on:2009-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J T BiFull Text:PDF
GTID:2178360245967555Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet, more and more people face the problem of retrieving foreign language information effectively. In the early days of Internet, web pages were English, and most casual users came from developed countries such as America or England. Subsequently, the gradual increase of the websites and users from non-English speaking countries brings new problems for traditional English-only information retrieval system. Therefore, it's necessary to study how to use our native languages to get foreign language information. So cross-language information retrieval became a hot topic.The goal of cross-language information retrieval is to get foreign language information from native language. Because the effectiveness of the monolingual information retrieval is pretty good, most researchers take the technology of monolingual information for reference during research on cross-language information retrieval. But the effectiveness of machine translation is poor because of cultural difference. So far, the technology of cross-language information retrieval can't satisfy with the requirement at the semantic level.In this paper, we introduce the main technology of cross-language information retrieval and relative international evaluation standards at first, and then describe the principle and modeling of latent semantic analysis and its applications. After that, we propose a translation model based on latent semantic analysis combining the theory of bi-directional translation. The experimental results show that the precision is better than traditional vector space model. Subsequently, to circumvent the defects of traditional cross-language information retrieval query expansion, we propose a new method for cross-language query expansion based on k-means clustering and latent semantic analysis. The method can relieve the negative influence of wrong translation or the ambiguity of words in translation. At last, we update the weightings of each word in new query. The results show the improvement of average precision.
Keywords/Search Tags:Cross-Language Information Retrieval, Latent Semantic Analysis, Machine Translation, Query Expansion
PDF Full Text Request
Related items