Font Size: a A A

Study On The Methods In The Selection Of Retrieval Unit In Mongolian Information Retrieval System

Posted on:2012-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YueFull Text:PDF
GTID:2178330335472223Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently, the information retrieval of Chinese and English has entered into a mature stage. However, due to the uniqueness of Mongolian language, there are many key technical problems yet to be resolved. The solution of these problems have great significance on the development of the Mongolian Information Retrieval. The subject that studied in this paper is one of the key technical problems.Mongolian is an ethnic language of the major nationality in Inner Mongolia Autonomous Region, it is an agglutinative language. Mongolian words are formed by attaching affixes to a root. In accordance with the characteristics of Mongolian, this paper makes a further research on the methods in the selection of index unit in Mongolian Information Retrieval with some specific information retrieval models. Information retrieval models includes TF-IDF Model, Vector Space Model, Language Model, and use the Good-Turing method, JM method, and Katz method to smoothing. Index unit includes form of whole word, root, root plus affix and n-gram, this paper detects their recall ratio and precision ratio with following four steps:build index, structured Query, retrieval and evaluation, so as to find out the most suitable index unit.This paper use 29,510 documents, scale of collection is 156 M, to conduct information retrieval testing, which centered on 12 topics and related details. By using Lemur system to establish the test platform. The author conducts a series of experiments and concludes that the root plus 2 suffix format and n-gram(n=4) format provides best performance.
Keywords/Search Tags:Mongolian Information Retrieval, Retrieval Unit, Language Model, Structured Query
PDF Full Text Request
Related items