Font Size: a A A

Research On Techniques Of Query Translation For Cross-language Information Retrieval

Posted on:2011-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y D GeFull Text:PDF
GTID:2178330332466093Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cross-Language Information Retrieval (CLIR) enables users to retrieve one language documents with another language query. Query translation is the widely used matching strategy in CLIR. Query translation mostly adopts dictionary-based method. However, Out Of Vocabularies (OOVs) in queries damage the performance of CLIR greatly. Thus,how to translate the OOV is the key issue in query translation.There is large amount of bilingual resource on the Internet. Such resource can be used to construct bilingual corpus. The abundant web resource resolves the problems (scale, domain, update etc.) of corpus-based query translation. In corpus-based method, bilingual resource is crawled from the web and then aligned with various features for constructing corpus. Translation knowledge is extracted from this corpus for query translation. High quality translation knowledge enhances the OOV translation inclusion rate. Search engine based method utilizes the high OOV translation inclusion rate feature of search engine to mine the translation of OOV for query translation. Cross-language expansion is used to collect high quality snippets. Frequency change measurement and adjacent information are introduced to extract candidates from snippets. This candidate extraction method improves the quality of candidate set. A combination model considering frequency-distance, surface patterns matching and phonetic feature is proposed to pick out the appropriate translation(s) from the big size candidate set. The translations of OOVs mined from search engine substantially enhance the performance of CLIR.This thesis compares the performance of dictionary-based method, corpus-based method and search engine based method, and studies the factors that affect the performance. Several query translation methods are combined. The combined method achieves further improvement of CLIR performance.
Keywords/Search Tags:Cross-Language Information Retrieval, Query Translation, OOV, Translation Mining, Search Engine
PDF Full Text Request
Related items