Font Size: a A A

Research On Chinese-Mongolian Cross-Language Information Retrieval Based Language Model

Posted on:2013-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:W J GongFull Text:PDF
GTID:2248330374470361Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the globalization of information, the demands for different languages become bigger and bigger. For most users who is non-proficient in foreign language, it is difficult to obtain the required information. Therefore, some researchers propose a research on the cross-language information retrieval (CLIR), currently there are already many research achievements in cross-language retrieval such as Chinese-English. However, related research work in the Mongolian retrieval is still relatively lack. Mongolian is one of the most important languages in the world, so the research of Mongolian information retrieval has a very important significance.There are many factors which affect the performance of a cross-language information retrieval system, but information retrieval model is the main factor, which research contents include the representation of the documents and query, the correlation matching strategy of the evaluation documents and user querying, the ordering method of the query results and the correlation feedback mechanism for users.As the good or not of the query expansion methods directly affects the retrieval performance, we put forward a query expansion method based on relative words and co-occurrence distance to solve the retrieval from the Chinese inquires to Mongolian inquires and then to the Mongolian document. In this paper, we use bilingual dictionary to extend the relative words for initial Chinese inquires, then re-expand the extended Mongolian inquires using the co-occurrence distance model, re-expand the extended Mongolian inquires using the co-occurrence distance model.Experiment results show that Mongolian stop words table effectively reduce the size of the index, stemming rules reduce the number of Term by nearly half in the index, and effectively improve the precision and recall ratio of the retrieval; The query expansion method based on relative words and co-occurrence distance presented in this paper have some improvement in the recall ratio as well as average accuracy, which can improve the performance of Chinese-Mongolian cross-language information retrieval.
Keywords/Search Tags:Chinese-Mongolian CLIR, Optimization of query translation, Lexical Near andFar Model
PDF Full Text Request
Related items