Font Size: a A A

Research On Cross Language Information Retrieval Based On Interlingua Semantic

Posted on:2009-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:G B HuangFull Text:PDF
GTID:2178360272480869Subject:Computer system architecture
Abstract/Summary:
With the rapid development of the Internet, the type and quantity of information resources on the Internet are increasingly rich and the language used is also increasingly diverse and unbalanced. And at the same time, with the sharp increase of the number and the scope of the Internet users, the language they use has become various. The diversity of network resources languages and the differences of languages the Internet users use inevitably lead to the language barrier for the people who retrieve information through the Internet. For example, more than 65 percent of information on the Internet is in English, but only about 30 percent of Internet users are using English. This has brought great inconvenience to the Internet users from non-English-speaking countries when they retrieve information through the Internet. Not only on the Internet but also on all the multilingual information systems (such as digital libraries) has the language barrier limited people's effective access to information and affected the full play of the value of multilingual information.From the late 1990s, people put a higher demand on the information retrieval , that is to say, they were no longer satisfied with the mono-language retrieval but wished to include a variety of related multilingual information in the retrieval results. To solve the problem of the language barrier existing in the process of obtaining information from multilingual information system, researchers put forward to the technology, known as Cross-Language Information Retrieval (CLIR), through which we can use a language to retrieve all the language related information in the system easily.The technology based on the pattern of dictionary and the machine translation system had become very hot when people carry out the cross-language information retrieval. The pattern based on the dictionary is to use the dictionary read by the computer to do the translation. The main problem here is the lexical ambiguity. A word may have multiple meanings, which results in the problem of choosing words by the machine translation system. Another problem is that the dictionary hasn't enough coverage because dynamic proper names change every day such as people names, place names, institutions names, which most probably can not be found in the dictionary in the translation process. The machine systematic translation is mainly aimed at the translation of documents, but the shortcomings of the translation of documents are that it has not high efficiency on implementation and the translation are often not precise.To solve the above problems, we propose a cross-language information retrieval method using the interlingua semantics based on Partial Least-Squares (PLS) theory. The experiment results showed that this method is effective.The innovations in this paper are as follows: First, a cross-language information retrieval model based on interlingua semantics is proposed by using the technology of the Partial Least-Squares (PLS) ; second, a parallel corpus of English and Chinese is built, which has laid the solid foundation to expand this parallel corpus in the future.
Keywords/Search Tags:interlingua semantics, cross-language information retrieval, PLS, potential semantic pair
Related items