The Impact Of Mongolian Stop-List And Stemming On Mogolian Text Categorization

Posted on:2010-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y N A

Full Text:PDF

GTID:2178360278967625

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of network, text categorization has bacame an important research in iformation processing area, and it is often used to handle or organize taxt data. In China, Mongolian is a kind of minority language and Mogolian information processing developed slowlly.But Mongolian plays a very important role in development and heritage of national cultural. So, the research of Mogolian text categorization technology has great significance.The study in this thesis mainly includes the following three aspects: (1) Tthe quality of corpus has great effect on capability of categorization system. Collect corpus which is based on the Mongolian International Standard Coding System and classify it manually.(2) Different stop-word selecting methods have different effects on text categorization system and so far, there are few studies about Mongolian stop-word. This paper analyzed some commonly used stop-word selecting methods (mainly TF method, DF approach, EC procedures, etc.), determined corresponding stop list. Proposed a stop-words selecting method based on translation after analysis of Mongolian morphology and syntax. Compared different methods' performance on Mongolian Text Ctegorization System.(3) Completed mongolian stemming using mongolian word suffix list. Choose the Support Vector Machine Classification algorithm as categorization algorithm construct Mongolian Text Categorization System.We have collected 850 Mongolian texts, they are divided into 9 categories. Use open-source software of support vector machine construct the categorizatin system. According to the experimentations, the EC stop-list has the best effect,then comes the stop-list based on translation; The Mongolian stemming can improve efficiency of categorization and have better effect with a combination of removing stop words.

Keywords/Search Tags:

Mongolian, Text Categorization, Stop-Words, Support Vector Machine

PDF Full Text Request

Related items

1	The Study Of Comparison Between Mongolian Stop Words And English Stop Words
2	Research And Application Of News Automatic Classification Technology Based On Support Vector Machines
3	Study On Text Categorization Method Based On Support Vector Machine
4	The Research On Text Categorization Algorithm Based On Support Vector Machine
5	Support Vector Machine Application In Text Categorization
6	Application For Web Text Categorization Based On Support Vector Machine
7	The Application Research Of Support Vector Machine Theory In Text Categorization
8	Research On Clustering And Text Categorization Based On Support Vector Machine
9	Research On Support Vector Machines Classification Algorithm In Text Categorization
10	A Study On Text Categorization Based On Machine Learning