Font Size: a A A

The Study Of Comparison Between Mongolian Stop Words And English Stop Words

Posted on:2012-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:G W GuanFull Text:PDF
GTID:2178330335472279Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet and development of information society, the users of retrieval systems become more and more, and the document language and content is also increasing. Because the uniqueness of Mongolian, the research of Mongolian information retrieval which relative to English and Chinese is not enough depth, especially in the details of the Mongolian information retrieval is less, such as the stop words of Mongolian.We initially get the Mongolian stoplist using TF, DF, EC, and UE methods in this paper. As the limitations of the contents of the Mongolian document set we used in this paper, the stoplist we tentatively identified contains the nouns in larger relation with the theme of the document entity; and also contains the Mongolian homonyms. In order to compare the English stoplist with the Mongolian stoplist in information retrieval, in this paper, we combination with the character of the Mongolian, optimized the Mongolian stoplist which intersection of four method used in the above. Our optimized method is in the following:firstly, we remove the words that larger relations with the document theme and Mongolian homonyms; secondly, we analyze the two kinds of stop words from the standpoint of the part of speech; finally, we translate the English stop words into Mongolian and apply them to the Mongolian document sets, and we compare that with the Mongolian stop words in retrieval. We also translate the Mongolian stop words into English and apply them to the English document sets, and we compare that with the English stop words in retrieval.The experiment results tested from the 25412 Mongolian documents show that the Mongolian stoplist received from the UE method is better than which received from the EC method; the optimized Mongolian stoplist is better than which received from the intersection of four method used in the above; the optimized Mongolian stoplist is better than that directly translated from English; however, the English stoplist is better than that translated into English from the Mongolian stoplist Therefore, we can't directly translate the English stop words into Mongolian and use it as the Mongolian stop words, but receive the Mongolian stoplist according the character of Mongolian and the related algorithms.
Keywords/Search Tags:Mongolian stop words, Mongolian information retrieval, English stop words, homonyms
PDF Full Text Request
Related items