Font Size: a A A

Research Of Mongolian Information Retrieval Model

Posted on:2010-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:W JinFull Text:PDF
GTID:2178360278467595Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Web is becoming a universal repository of human knowledge and culture which has allowed unprecedent sharing of ideas and information in a scale never seen. But because of the difference of each language, it is still short of the research on minority languages. And it is severity encumbrance to the spread of minority languages. Mongolian is one of the most important languages in the world. So the research of Mongolian information retrieval becomes more and more important.In order to construct a search engineer which is fit for Mongolian, we analyzing the Mongolian characteristic in morphology and syntax, and designed the scheme of indexing units for Mongolian IR, including partitioning the Mongolian term and the rules for Mongolian stemming; We use three methods to determining the Mongolian stop list; After analyzing other information retrieval models we find the right model which is fit for Mongolian and according to these experimentations we compare the effect of the smoothing methods, Mongolian stemming and query structured method.We have collected 27345 Mongolian corpus, construct a Mongolian document sets, 11 topic and the relevance judgment collection and run our Mongolian test collection on a model combines the language modeling and inference network approaches to information retrieval using Indri; According to the experimentations, the Mongolian stemming can reduce the index and enhance the precision, the EC Mongolian stop-list has the best effect; The Mongolian stemming rules can reduce a lot of terms and enhance the recall; Compare the effect of the other information retrieval models, the model of combining the language modeling and inference network approaches has the best effect; Determining the best smoothing parameters, all of the three methods are fit for the models, but Jelinek-Mercer smoothing is better than others.
Keywords/Search Tags:Mongolian IR, Inference Network, Language Model, Structured Queries, Mongolian Stemming
PDF Full Text Request
Related items