Font Size: a A A

Automatic Categorization Of Text Genre In Chinese

Posted on:2006-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z F FangFull Text:PDF
GTID:2168360152975718Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the end of 20 century, genre problem has become one of the hottest research points of computational linguistics and traditional linguistics in the world. Many researchers of text classification in the field of computational linguistics have realized the importance of form classification. Though they have gotten some preliminary accomplishments, genre classification research is still in the initial time of completely exploring stage. Genre is defined as a category assigned on the basis of external criteria, that is, it refers to form of the text. It is connected close with writing style and the analysis of sentence structure. The research of genre automatic classification has very high theoretical value and profound realistic significance.However, how to distinguish, describe and use text genre" is a complex and challenging work. Firstly, its concept system is an abstract summary of human thought. The limited experience and knowledge of researchers and its constant development make it difficult to summarize and describe genre classification roundly, accurately and efficiently. Secondly, genre automatic classification, which intersects between computational linguistics and the Chinese traditional rhetoric, needs to have the deeper theoretical foundation of linguistics theory and computational linguistics. Therefore, there is still some obstacles that must be surmounted on its research road.The major contribution of this paper is to put forward the automatic system of Chinese text genre classification. This system is divided into three parts, those are corpus collection, feature items selection and classification algorithm realization. It has already got achievement on corpus of scientific type, political comment type, poem type, official document type and news type, the typical texts of five kinds of genre. Compared with foreign related research, the number of our chosen feature terms is fewer, but classification precision and results are good enough. Modular design has facilitated systematic adjustment and test greatly. The program of feature choosing being short of flexibility is limitation of the system. And it must be changed along with each change of genre classification system. Though this paper has make some progress, actually it is surely only one preliminarily try of genre automatic classification research.
Keywords/Search Tags:Automatic classification text genre, Features choosing, Parametric distribution method, SVM
PDF Full Text Request
Related items