Study On The Methods In The Selection Of Retrieval Unit In Mongolian Information Retrieval System

Posted on:2012-08-01

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Yue

Full Text:PDF

GTID:2178330335472223

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Currently, the information retrieval of Chinese and English has entered into a mature stage. However, due to the uniqueness of Mongolian language, there are many key technical problems yet to be resolved. The solution of these problems have great significance on the development of the Mongolian Information Retrieval. The subject that studied in this paper is one of the key technical problems.Mongolian is an ethnic language of the major nationality in Inner Mongolia Autonomous Region, it is an agglutinative language. Mongolian words are formed by attaching affixes to a root. In accordance with the characteristics of Mongolian, this paper makes a further research on the methods in the selection of index unit in Mongolian Information Retrieval with some specific information retrieval models. Information retrieval models includes TF-IDF Model, Vector Space Model, Language Model, and use the Good-Turing method, JM method, and Katz method to smoothing. Index unit includes form of whole word, root, root plus affix and n-gram, this paper detects their recall ratio and precision ratio with following four steps:build index, structured Query, retrieval and evaluation, so as to find out the most suitable index unit.This paper use 29,510 documents, scale of collection is 156 M, to conduct information retrieval testing, which centered on 12 topics and related details. By using Lemur system to establish the test platform. The author conducts a series of experiments and concludes that the root plus 2 suffix format and n-gram(n=4) format provides best performance.

Keywords/Search Tags:

Mongolian Information Retrieval, Retrieval Unit, Language Model, Structured Query

PDF Full Text Request

Related items

1	Mongolian Query Expansion And Information Retrieval System Establishment
2	Research Of Mongolian Information Retrieval Model
3	Research On Mongolian-Chinese Cross-language Information Retrieval Model
4	Using Statistical Language Modeling For Ad Hoc Information Retrieval
5	Research Of Mongolian Retrieval Technology Based On The New Incremental Query Expension
6	Research On Chinese-Mongolian Cross-Language Information Retrieval Based Language Model
7	Research On Translation Methods Of Query Items In Chinese-Mongolian Cross-Language Information Retrieval
8	Research On And Implementation Of Chinese Structured Information Retrieval
9	Research Of Mongolian Information Retrieval Method Based On The LDA And System Implementation
10	Research On Information Retrieval Based On Language Model And Reranking For Retrieval Results