Font Size: a A A

The Impact Of Mongolian Stop-List And Stemming On Mogolian Text Categorization

Posted on:2010-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y N AFull Text:PDF
GTID:2178360278967625Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of network, text categorization has bacame an important research in iformation processing area, and it is often used to handle or organize taxt data. In China, Mongolian is a kind of minority language and Mogolian information processing developed slowlly.But Mongolian plays a very important role in development and heritage of national cultural. So, the research of Mogolian text categorization technology has great significance.The study in this thesis mainly includes the following three aspects: (1) Tthe quality of corpus has great effect on capability of categorization system. Collect corpus which is based on the Mongolian International Standard Coding System and classify it manually.(2) Different stop-word selecting methods have different effects on text categorization system and so far, there are few studies about Mongolian stop-word. This paper analyzed some commonly used stop-word selecting methods (mainly TF method, DF approach, EC procedures, etc.), determined corresponding stop list. Proposed a stop-words selecting method based on translation after analysis of Mongolian morphology and syntax. Compared different methods' performance on Mongolian Text Ctegorization System.(3) Completed mongolian stemming using mongolian word suffix list. Choose the Support Vector Machine Classification algorithm as categorization algorithm construct Mongolian Text Categorization System.We have collected 850 Mongolian texts, they are divided into 9 categories. Use open-source software of support vector machine construct the categorizatin system. According to the experimentations, the EC stop-list has the best effect,then comes the stop-list based on translation; The Mongolian stemming can improve efficiency of categorization and have better effect with a combination of removing stop words.
Keywords/Search Tags:Mongolian, Text Categorization, Stop-Words, Support Vector Machine
PDF Full Text Request
Related items